T exa\. 


on the Prognostic | 


aa 
rk : SAMUEL C. FULKERSON 


# 


al Bands. 


PK +. 


ublished Bimonth! 
Psychological 


ae nee, 





Consulting Editors +e 
WwW, He Ho.tzmMan 


hersily of Texas 
Texas 
University me Stale WVniversity 


: Si B. SELLS a 
Southern California Texas Christiin Unéver. 


W. A. Witson, Je: 
Bryn Maur Coll ge: 


Bulletin contains evaluative reviews of research litera! 


ch methodology and instry mentatiost im psychology. This Jo 


its Of original research or originas theoretical orticles. — =a 
ld e.sent to the Editor, Harry Helson, Department of Pays 
Austin 42; Texas. 
eeies for publication. Authors are strongly advised'to'f 
Evenjin the Publication Manual of the American Ps 
Rev; on) Special attention should be given to the sectie 
feferences (pp. 50-60), since this is a particular sou 
of research literature. AM jwpy must be double 
All manuscripts shouid be salimitted in duplicate, 
ginal typed copy; author’s name should appear only 
iheographed copies are not aggeptable and will not 
ures are prepared for publication; duplicate ‘cures may be 
fawn copies. Authors are cautioned to retain a copy oF 
gainat loss in the mail and to cheek carefully the typing of 


4 


ide reprints are given to contributory of articles and notes. 


HeLBN Orr 
Pro Manager 
3 


% 
ding subscriptions, orders of back issues, and changes 
tased to tle American Psychologica! Association, 1333 
hington 6, 1).C. Address changés must reach the Subsert 
the month to take effect the following month, Undeliver 
dress changes will not be replaced; subscribers show 
Will guarantec second-class forwarding postage. Other 
muat be made within four months of publication, id 
2 $10.00 (Foreign $10.50). Single copies, $2.00. 


PUBLISHED BIMONTHLY BY: 
PSYCHOLOGICAL ASSOCIATION, INC, 
1333 Sixteenth Street N.W., Washington 4, D<, ee 


. 
id. at Washington, D.C., and at addi onal mulling offices. Printed im U.S.AL 
The American Paycholocical Agsodda 





VoL. 58, No. 3 


May 1961 


Psychological Bulletin 





METHODOLOGY AND RESEARCH ON THE PROGNOSTIC 
USE OF PSYCHOLOGICAL TESTS 
SAMUEL C. FULKERSON anp JOHN R. BARRY! 
Western Psychiatric Institute and Clinic, University of Pittsburgh 


There has not been a general re- 
view of the use of psychological tests 
in prognosis since Windle’s review in 
1952. At that time Windle concluded 
that, (a) it appeared to be some char- 
acteristic of the patient rather than 
the therapy given which determined 
the outcome of mental illness; (0) 
most studies in the area were difficult 
to interpret due to inadequate specifi- 
cation of one or more of the following: 
the sample characteristics, the treat- 
ment schedule, the criteria of im- 
provement, and the degree of control 
imposed on variables influencing out- 
come; (c) the necessary step of cross- 
validation was usually omitted; and 
(d) personality tests, including the 
projective tests, had shown little 
promise in predicting outcome. 

The purpose of the present article 
is to bring the review of the research 
on the prognostic use of tests up-to- 
date and to deal with some related 
methodological issues. The scope and 
organization will depart from that 
used by Windle. Firstly, the present 
review covers a wider range of cri- 
teria. Windle considered primarily 
the problem of predicting improve- 
ment. However there seems to be a 
complex of criteria which are closely 
related, logically and in practice, and 
so articles have been included dealing 


1 Acknowledgment is due the criticism and 
additions of Charles Windle and Joseph Zubin. 


with a variety of criteria other than 
improvement. Secondly, the organ- 
ization will differ from Windle’s. He 
centered his review around individual 
tests, taking each test in turn and 
citing all prognostic studies where it 
had been used. The present paper is 
organized around the predictive prob- 
lem rather than the individual test, 
since in practice the clinician wants 
to know how to come to a decision 
about a patient rather than what can 
be done with a given test. It is hoped 
that this emphasis will help to point 
up which questions are involved in 
the area of prognosis, and the relative 
attention each has received in re- 
search. And finally, the emphasis on 
decisions reflects an interest in deci- 
sion theory (Luce & Raiffa, 1957), 
which has recently been suggested 
(Cronbach & Gleser, 1957) as a 
promising frame of reference from 
which to regard psychodiagnostic 
testing. 

Windle included studies from as 
early as 1926 through 1951. The pres- 
ent review mainly covers the period 
from 1952 through June 1959. The 
coverage is more complete for those 
sections dealing explicitly with the 
prognostic use of tests than for the 
sections on methodological problems. 
Only the major psychological and 
psychiatric journals have been re- 
viewed exhaustively. 
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METHODOLOGY 


The methodological difficulties in 
research on prognosis concern the re- 
searcher’s decisions as to what sam- 
ples he will use, what selection in- 
struments he will apply to the sam- 
ple, and what criteria seem most 
appropriate. 


Sample Attributes 

One of the primary methodological 
difficulties has been the definition of 
the sample. Psychiatric diagnosis 
appears to have been the predomi- 
nant basis of sample definition in 
spite of the known unreliability of 
these categories. Attributes of the 
sample such as age, education, sex, or 
socioeconomic status are usually 
listed. However, little attention has 
been paid in most of the studies re- 
viewed to achieving homogeneous 
samples or subsamples. Some in- 
vestigators have worked with only 
one diagnostic group, mainly schizo- 
phrenics. Since schizophrenia is a 
diagnosis given to over 50% of un- 
specified functional psychotic dis- 
orders, the difference between the re- 
sults from such studies and those 
using psychotics sampled ‘at random 
are hard to determine. 

The need for homogeneous samples 
is clearly pointed up by a considera- 
tion of the question of base rates. 
Meehl and Rosen (1955) said “a 
psychometric device, to be efficient, 
must make possible a greater number 
of correct decisions than could be 
made in terms of the base rate alone”’ 
(p. 194). Studies of base rate as a 
function of diagnostic category 
(Langfeldt, 1956; Pascal, Swensen, 
Feldman, Cole, & Bayard, 1953; 
Rennie, 1953) indicate wide fluctua- 
tion in outcome between categories. 
‘Examination of these base rates indi- 
cates that a sample of psychotic pa- 
tients with a preponderance of manic- 
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depressives would have a higher base 
rate of improvement (approximately 
68%) than a sample consisting of 
schizophrenics (approximately 50%). 
A predictor, even though its actual 
validity was zero, could do much 
better predicting outcome in the 
first sample than in the second if the 
cutting point of the predictor was ad- 
justed to take advantage of the per- 
centage of improvement. The opti- 
mal chance percentage of correct pre- 
diction in the first sample (achieved 
by calling everyone improved) would 
be 68%, which is equal to the base 
rate. With a sample of any size this 
would differ significantly from 50%, 
the value likely to be designated as 
chance if one did not know the base 
rate. And if the relative effectiveness 
of a predictor in the two samples were 
tested it could appear, spuriously, 
that the predictor was 68% correct 
within one sample but only gave 50% 
correct prediction in the other. Thus, 
prognostic research designs which 
compare the results in the experi- 
mental group against statistical 
chance, or which compare two small 
groups that are not sufficiently 
matched on variables related to base 
rates, cannot result in useful informa- 
tion. Since effective handling of the 
problem of sample homogeneity is un- 
common in the prognosis studies re- 
viewed by Windle and ourselves, the 
generality of findings is low, or at 
best difficult to determine. 

It has been assumed that homo- 
geneous sampling represents an effec- 
tive way of solving the problem of 
sample definition. However there is 
one danger. If the basis on which the 
homogeneity is established is highly 
related to the criterion variable, the 
variability of the criterion will be re- 
stricted. This can of course obscure a 
relationship that might exist between 
a predictor and criterion. It has been 
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tacitly assumed that adequate ran- 
domization is not easily achieved in 
prognosis research, considering the 
usual sample size and the biases of 
the clinical populations from which 
they are drawn; otherwise random 
sampling would be an efficient way to 
select, and thus operationally define, 
the sample. 

Another difficulty with diagnosis 
as a basis for the definition of the 
samples is that it represents clinical 
judgments which are based upon an 
often uncertain weighting of situa- 
tional and response variables. For 
instance, the diagnosis of depressive 
reaction typically requires a differ- 
entiation as to whether the affective 
response reflects anxiety or depres- 
sion, and a decision concerning the 
degree to which the affective response 
is related to a currently stressful 
situation. Clearly clinicians can vary 
as to the relative emphasis they place 
on these variables; and, as several 
studies (Glass, Ryan, Lubin, Reddy, 
& Tucker, 1956; Gleser, Haddock, 
Starr, & Ulett, 1954) have shown, 
they do vary in their weighting pro- 
cedures. Therefore, despite the con- 
venience of using diagnosis as a sam- 
ple-defining operation, it is weak in 
that the researcher loses some control 
over the basic stimulus and response 
elements upon which the judgment is 
based. Studies of the effects of these 
elements on test validity and on the 
efficiency of cutting scores are called 
for; this kind of research, frequent in 
personnel psychology, seldom ap- 
pears in the clinical journals. 


Tests 


Here the primary difficulty has 
been to define that universe of test 
behavior related to outcome. The 
majority of studies cited in Windle’s 
earlier review used standard tests, 
e.g., Rorschach, TAT, MMPI. It is 
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likely that these tests tap only a 
small part of the response spectrum. 
With the welter available, it is still 
far from clear how many separate 
functions they sample, and no defini- 
tive taxonomy of tests exists. Zubin 


, and his co-workers (Burdock, Sutton, 
: & Zubin, 1958; Burdock & Zubin, 


1956; Zubin, 1958, 1959) have pro- 
posed five broad categories of activity 
to which test behavior can be as- 
signed: physiological, sensory, per- 
ceptual, psychomotor, and concep- 
tual. Each of these five categories 
has been further subdivided into 
classes of stimuli and responses. For 
the most part, the prognostic meas- 
ures which Zubin has selected to use 
within each category are simpler than 
such tests as the Rorschach, in the 
sense that they present fewer stimu- 
lus dimensions and require less elabo- 
rate and lengthy responses. Such 
systems of categorization may indi- 
cate a range of tests available, but 
within each category there is a de- 
gree of complexity which at this time 
is largely unknown. However, factor 
analyses have been carried out in the 
areas of perception (Thurstone, 
1944), psychomotor tests (Fleishman 
& Hempel, 1954, 1956; Hempel & 
Fleishman, 1955; Seashore, Buxton, & 
McCollum, 1940), and cognition 
(Guilford, 195€, 1959). Such anal- 
yses afford at least a partial basis for 
rational test selection. 

Since a number of studies (e.g., 
Conrad, 1954) indicate that severity 
of mental illness is a significant prog- 
nostic variable, researchers looking 
for simple tests for prognostic studies 
may find it of value to consider 
studies in the area of differential diag- 
nosis. H. E. King (1954) was able to 
differentiate between chronic schizo- 
phrenics, subacute behavior dis- 
orders, and normals using psycho- 
motor tasks; and Eysenck, Granger, 
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and Brengelmann (1957), with 
groups similar to those used by King, 
found a large number of both simple 
and complex perceptual tests which 
discriminated between their groups. 
Rabin and G. F. King (1958) re- 
viewed studies dealing exclusively 
with schizophrenia, and concluded 
that “relatively high discriminatory 
power... has been obtained with 
simple experimental tasks. In many 
cases it has been as good as or better 
than that found with more complex 
tasks”’ (p. 253). 


Criteria 


There are three broad aspects of 
prognosis in mental illness: duration, 
course, and outcome. Studies pre- 
dicting duration have used criteria 
such as length of hospital stay, the 
amount of time spent on the admit- 
ting or disturbed ward before transfer 
to a less disturbed ward (Gordon, 
Lindley, & May, 1957), and length 


of treatment. 


Criteria involving the course of ill- 
ness include measures of termination 


and relapse. In inpatient settings 
premature termination has been de- 
fined as leaving the hospital against 
medical advice; in outpatient settings 
it has been variously defined as not 
appearing for the initial interview 
after making an appointment, drop- 
ping out of therapy before some stipu- 
lated minimal number of contacts, or 
dropping out of therapy against the 
wishes of the therapist. 

All criteria of improvement have 
been classified in this paper as meas- 
ures of outcome. It could be argued 
that change over time is a measure of 
the course of illness, but this category 
has been reserved for specific qualita- 
tive aspects of change. Current cri- 
teria of improvement present the 
same difficulties in definition, and for 
this reason it is convenient to deal 
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with them together, and to separate 
them from termination and relapse 
criteria. Because there is no univer- 
sally agreed upon definition of the 
term ‘mental illness’ (Jahoda, 1958; 
Scott, 1958), there has been a con- 
comitant lack of clarity about how to 
measure its alleviation. Three 
sources of improvement criteria are 
common: (a) ratings of improvement 
made by the therapist, the patient, or 
other persons in contact with the pa- 
tient such as relatives, professional 
staff other than therapist, or even 
fellow patients; (6) changes in objec- 
tive measures of functioning such as 
physiological changes, or improve- 
ment in psychological test perform- 
ance; and (c) follow-up data of ‘a be- 
havioral nature, such as whether pa- 
tient is able to get and hold a job, to 
get or remain married, or, in what- 
ever way, to resume a minimally in- 
dependent social existence. 

There have been several attempts 
to systematize these various outcome 
measures. An early breakdown of the 
separate areas of behavior which 
should be evaluated was made by 
Knight (1941). He suggested that 
therapists look for change in these 
five areas of adjustment: the cisabl- 
ing symptoms or problems, the inter- 
personal relations, the sexual adjust- 
ment, the productivity (i.e., the abil- 
ity to work effectively and to utilize 
available energy), and the ability to 
handle stress. In Zubin’s classifica- 
tion of tests, the ability to handle 
stress is viewed as a general param- 
eter which might apply to the other 
four areas. 

Barron (1953b) listed five similar 
criteria of improvement: (a) the pa- 
tient feels better—indicated by intro- 
spective comments by the patient; 
(6) the patient relates better to others 
—requiring a follow-up at work, 
school, or home, and often based on 
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reports of members of the patient's 
social group; (c) the patient’s symp- 
toms clear up—as measured by 
psychiatric ratings of improvement 
at discharge, as well as indirectly by 
measures of duration, e.g., length of 
hospital stay and speed of transfer to 
minimum security wards; (d) the pa- 
tient makes decisions in a health- 
tending direction; and (e) the pa- 
tient’s verbal behavior shows in- 
creased ‘‘insight.”’ 

A few other criteria have occasion- 
ally been proposed to supplement 
these. Winder (1957) has suggested 
changes in the adjustment of children 
of the patient, and Morse (1953) has 
proposed accessibility to psycho- 
therapy. Reznikoff and Toomey 
(1959) list in detail a variety of at- 
tempts to provide a taxonomy of out- 
come criteria. 

There are measurement problems 
in all of these approaches. Scott 
(1958) has pointed out several con- 
ceptual and methodological difficul- 
ties in the various definitions of 
mental health. His discussion can be 
applied to Barron’s criteria of im- 
provement in mental health: (a) 
apparent change in subjective feel- 
ings or symptomatology can be a 
function of change in environmental 
conditions or can be distorted by de- 
fense mechanisms; (0) difficulties in 
social relationships can be a function 
of the differing requirements of 
socioeconomic and cultural systems, 
and can change as the patient 
changes his community or his con- 
tacts in the community; (c) there can 
be disagreement over which is a 
health-tending direction, since value 
systems are frequently involved; and 
(d) changes in insight may be a func- 
tion of the degree to which the pa- 
tient is willing to conform to the 
theory and values of the therapist. It 
should be noted that these points 
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need not be regarded as criticisms of 
the definitions. If, for instance, 
changes in subjective feelings are con- 
sidered important in their own right, 
then changes in feelings, whether due 
to environment or defense mecha- 
nisms, are still of interest. However, 
when used as criteria, such changes 
are meant to reflect specific intra-in- 
dividual changes that are _ inde- 
pendent of environmental or irrele- 
vant personal factors. Despite this, 
most of the research in prognosis 
seems designed to demonstrate only 
that characteristics of the patient 
exist which relate to outcome, with- 
out controlling sufficiently for the 
above mentioned environmental and 
personal factors. 

On a less general level, Parloff, 
Kelman, and Frank (1954) have 
listed several common sources of 


ambiguity in improvement criteria: 
(a) improvement is often treated as a 
unitary concept, but this may be 


erroneous; (b) the emphasis of the 
rater can interact with aspects of the 
treatment—for instance, symptoms 
typically disappear before insight 
occurs, so that a rater who requires 
signs of insight before he gives a 
rating of improvement will judge 
fewer patients to be improved than 
one who accepts symptom allevia- 
tion as improvement; and (c) im- 
provement is likely to be overesti- 
mated, since patients fluctuate in be- 
havior, and at any given time signs of 
improvement in one or more specific 
areas are likely to be present and 
thus overvalued by a judge being 
asked to make a global, subjective 
rating. Pascal and Zax (1956) crit- 
icize the usual gross “improved-un- 
improved” criterion on the grounds 
that it is not sufficiently tailored to 
the specific desired changes of the pa- 
tient. They reject all nonbehavioral 
criteria of improvement, and essen- 
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tially appear to feel that symptom- 
change should be the primary cri- 
terion of improvement. 

It would be valuable to know the 
factorial structure of the above 
course, duration, and outcome meas- 
ures. While no study was found 
which attempted to do this, several 
reported intercorrelations between 
two or more prognostic criteria. 
These will be described separately for 
the kinds of criteria involved. 

Correlations between outcome meas- 
ures. Kelman and Parloff (1957) 
intercorrelated a number of measures, 
including ratings of comfort and self- 
awareness made by the patient, and 
social effectiveness ratings made by 
persons close to the patient as well as 
professional observers. The change 
in rating from pretherapy to 20 weeks 
after the initiation of therapy was de- 
termined. Only 1 of 21 intercorrela- 
tions between these measures of 
change was found to be significant. 
However, the correlations were based 


on an N of only 15, and the period of 
time was perhaps too short to expect 
more than minimal changes. 


Storrow (1959, 1960) compared 
ratings of improvement made by 
therapists, patients, relatives of the 
patient, and a psychiatrist who had 
access only to abstracted material. 
Two related rating clusters were 
found: the patient’s self-rating, the 
relative’s rating, and the rating made 
by inexperienced therapists (third 
year medical students) formed one 
cluster; with the experienced thera- 
pist and the nontherapist psychia- 
trist forming the other. The correla- 
tions within clusters ranged from .61 
to .79; between clusters, .32 to .57. 
These two clusters seemed to reflect 
primarily a dichotomy between pa- 
tient and experienced therapist, since 
the relatives, and apparently the in- 
experienced therapists, gained their 
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impression from hearing the patient’s 
views of his progress, while the non- 
therapist psychiatrist obtained his 
knowledge from the file written by 
the therapist. Storrow had the rat- 
ings made separately for each of 
Knight’s (1941) five areas, and the 
average intercorrelation between 
areas was approximately .60. Ells- 
worth and Clayton (1959) found that 
a measure of ward adjustment at dis- 
charge correlated significantly (.47) 
with a 3-month follow-up rating of 
community adjustment. However, 
amount of psychopathology at dis- 
charge had no relationship to the 
follow-up criterion. Their finding can 
be compared with the intercorrela- 
tion of .57 reported between two 
simultaneous ratings of adjustment 
made on different scales (Stilson, 
Mason, Gynther, & Gertz, 1958). 
Patient expressions of positive and 
negative feelings have been used as 
evidence of improvement (see Auld & 
Murray, 1955, for a review of these 
measures). Barry (1950) found low 
but significant correlations between 
these so-called internal or feeling cri- 
teria and global judgments of im- 
provement in adjustment. Rogers 
and Dymond (1954) have found that 
changes in patient self-ratings on 
Q sorts correlated with ratings and 
other criteria of improvement. In an 
analogous group research program 
Snyder (1953) reported that self- 
rating changes correlated signifi- 
cantly with judgments of improve- 
ment. The same results have been 
reported by Kalis and Bennet (1957). 
Taylor (1955) found that self-ratings 
(Q sorts) tend to become increasingly 
positive simply with the passing of 
time. This suggests that it is impera- 
tive to control for time in treatment 
in order to evaluate the actual extent 
of the relationship between self-rat- 
ings and other improvement criteria. 
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Correlations between duration and 
outcome. Ullman (1957) reported that 
a measure of length of hospital stay 
correlated .36 (V=72) with a meas- 
ure of adequacy of interpersonal 
relationships (Palo Alto Group 
Therapy Scale), those rated most 
adequate after a period of group 
therapy being the ones with the short- 
est hospital stay. Pascal et al. (1953) 
found a correlation of .37 (N =486) 
between length of hospital stay and 
ratings of improvement made a year 
after discharge; again, the greater the 
improvement, the shorter the hospi- 
tal stay. 

A significant positive relationship 
has been frequently reported (Bailey, 
Warshaw, & Eichler, 1959; Myers & 
Auld, 1955; Seeman, 1954; Sullivan, 
Miller, & Smelser, 1958) in which 
greater length of psychotherapy in 
outpatient settings is accompanied 
by judgments of greater improve- 
ment. An interesting exception to 
this is the phenomenon called the 
“failure zone.”’ 

D. S. Cartwright (1955) found a 
grossly linear relationship between 
the number of psychotherapy ses- 
sions and success of outcome as noted 
by the therapists; but the mean suc- 
cess rating dropped sharply for those 
whose therapy lasted from 13 to 21 
interviews. Cartwright was report- 
ing on cases treated by nondirective 
techniques. Taylor (1956) validated 
this ‘‘failure zone’’ in a psychoana- 
lytically oriented setting. Standal 
and van der Veen (1957) obtained 
the same drop in a counseling center 
sample. Vosburg (1958), in an ex- 
amination of treatment charts, found 
evidence that from the fifteenth to 
twentieth hour was a period where 
outpatients tended to be preoccupied 
with their relationship with the 
therapist, suggesting that treatment 
which ended in this period might 
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often be due to a desire on the part of 
either the patient or therapist to 
avoid the close, dependent relation- 
ship which was developing. Perhaps 
supplementing this, Ends and Page 
(1959) reported that the “flight into 
health”’ reaction occurred in group 
psychotherapy uniformly around the 
fourteenth session. 

Correlation between duration and 
course. Crandall, Zubin, Mettler, and 
Logan (1954) found a significant rela- 
tionship between the duration of ini- 
tial hospitalization and rehospitaliza- 
tion; patients who stayed in the hos- 
pital a short time were most likely to 
still be out of the hospital on 1 to 4 
year follow-up. 

To summarize these intercorrela- 
tions, patient self-ratings and thera- 
pist ratings appear to covary to a 
high degree. Although measures of 
duration and course of illness have 
some relationship to improvement 
ratings, they seem also to tap differ- 
ent sources of variance. 

Reliability. The reliability of out- 
come criteria has received attention; 
the duration and course measures are 
objective enough so that their reli- 
ability has been taken for granted. 
Miles, Barrabee, and _ Finesinger 
(1951) reported low interjudge but 
high test-retest intrajudge reliability 
of global judgments of improvement. 
Ten cases were rated by four judges 
on a six-point scale. There was com- 
plete agreement for only 20% of the 
judgments, though no disagreement 
was by more than two points. Test- 
retest figures showed 70% to 74% 
complete agreement between ratings 
taken 6 to 8 months apart. The rat- 
ings were based on structured inter- 
view material, and probably repre- 
sent the lower bounds of interjudge 
agreement, if it is assumed that rat- 
ings made after a long period of ob- 
servation of the patient -would show 
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more stability than ratings made on 
the minimal information contained in 
a structured interview. These in- 
vestigators felt that changes in psy- 
chiatric status over time cannot be 
discriminated any more finely than 
in terms of three gross classes: un- 
changed or worse, improved, and 
markedly improved. Levitt (1957) 
presented data suggesting that 
judged improvement rate tends to in- 
crease as a function of the number of 
points on the scale. The greatest dis- 
crepancy was due to studies using a 
two-point ‘improved-unimproved”’ 
scale, where the mean percentage im- 
proved was 51. Studies using three- to 
five-point scales had mean improve- 
ment rates of 73% to 76%. 

A possible source of unreliability in 
judgments of improvement lies in the 
fact that they may confound the 
amount of change with the absolute 
level of terminal adjustment. Thus it 
seems likely that the reason initial 
severity of illness correlates with im- 
provement (Conrad, 1954) is to some 
extent due to the fact that those who 
are high on a measure of adjustment 
initially will be high on adjustment 
terminally, though the change may 
be far from being as dramatic as for 
patients who are admitted in a state 
of confusion and disorientation, and 
discharged without these symptoms. 
Since each judge can combine amount 
of change and absolute level as he 
chooses, in most studies, a lowering 
of interjudge agreement is to be ex- 
pected. This may be involved in the 
much higher interrater reliabilities 
reported by Morton (1955) than by 
Miles, Barrabee, and _ Finesinger 
(1951). Morton developed seven- 
point scales of absolute level of ad- 
justment in 12 different areas. After 
training, the interrater reliability co- 
efficients ranged from .79 to .91 when 
the ratings were based only on tran- 
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scriptions of a terminal interview; 
and the reliability of the improve- 
ment score (the difference between 
ratings of an initial and terminal 
interview) ranged from .59 to .78. 

Tests as criteria. A possible cri- 
terion of outcome is performance as 
measured by tests. The present re- 
view uncovered no studies which 
used changes in test scores as primary 
prognostic criteria but it remains a 
reasonable possibility. The primary 
requisite for this use of tests would 
be evidence that the tests covary 
with the changes in patients that go 
to make up the concept of improve- 
ment. A number of studies have been 
published which tackle this question, 
and in general they support the as- 
sumption of covariation. 

Pascal and Zeaman (1951) found 
that the Bender-Gestalt, color-nam- 
ing, noun-naming, and serial sub- 
traction, from a larger battery of 
tests, correlated with the course of 
progress as judged clinically, for four 
patients getting electroconvulsive 
therapy. 

Hybl and Stagner (1952) reported 
a significantly greater decrease in the 
amount of disruption of performance 
brought about by a frustration ex- 
perience, for patients rated by their 
therapists as improved. The tasks 
were three psychomotor tests: the 
Ferguson Form Boards, Digit Sym- 
bol from the Wechsler-Bellevue, and 
the Minnesota Rate of Manipula- 
tion Test. 

Vinson (1952) administered a 
mirror drawing test before and dur- 
ing electroshock therapy to 18 in- 
patients. Change in the mirror draw- 
ing score correlated .72 with change 
in orientation as evaluated by the 
clinical staff. 

Several studies (Hozier, 1959; 
Wechsler, 1958) indicate that as 
psychotic patients improve there is a 
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decrease in variability of both the 
quality and the quantity of test per- 
formance. 

The MMPI has been used in a 
number of studies of change: several 
studies (Carp, 1950; Feldman, 1951; 
Schofield, 1950, 1953) have reported 
that hospitalized patients treated 
with somatic therapies show an aver- 
age drop on all of the MMPI scales of 
from 8 to 13 7-scale points. The 
acutely ill changed more than the 
chronically ill, and the affective dis- 
orders showed a greater change than 
the schizophrenics. Feldman (1951, 
1952) found that improved patients’ 
MMPI profiles dropped more than 
unimproved patients’ profiles, and 
that the averaged profiles of these 
two groups showed greater differences 
after therapy than before. Work 
with predominantly psychoneurotic 
samples (Barron & Leary, 1955; 
Kaufman, 1950; Schofield, 1950) has 
indicated a larger drop on most 
scales for improved patients than for 
those rated unimproved. Changes 
taken without regard to sign (de- 
creases as well as increases) were sig- 
nificantly greater in an individually 
treated group than in a group treated 
by group-therapy methods (Barron 
& Leary, 1955; Leary & Harvey, 
1956). 

Harris (1959) has 
such MMPI studies 
follows: 


summarized 
to date as 


scores on the MMPI show little change in 
normals and in untreated psychiatric patients 
over extended periods of time; somatic ther- 
apy, which is known to be effective at least in 
readying patients for discharge from the hos- 
pital, is accompanied by sizeable drops in test 
scores; patients in psychotherapy show smal- 
ler changes, perhaps not much larger than 
those produced by the passage of time alone; 
and the magnitude of change in test scores is 
related to clinical estimates of improvement 
(p. 519). (Quoted by permission of National 
Academy of Sciences-National Research 
Council) 
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Extraneous effects in test-retest 
comparisons need to be kept in mind, 
and Windle (1954) has reviewed 
these in reference to questionnaires. 
He presents evidence for a general 
tendency toward less deviant answers 
on retest, irrespective of external 
factors. This tendency is less, the 
greater the time period between test 
administrations. But even taking 
these artifactual sources of error into 
account, there appears to be evidence 
that a variety of test responses 
change in a manner consistent with 
therapist judgments of change in 
mental health. 


RESEARCH IN PROGNOSIS 


This section is organized around 
the three elements that seem most 
prominent in any treatment: the 
treatment itself, the person adminis- 
tering the treatment, and the patient 
who receives the treatment. Dura- 
tion, course, or outcome of illness 
can potentially be affected by any 
one of these. The practical need to 
determine the prognosis of a patient 
implies that some selection is pos- 
sible concerning the most appro- 
priate treatment for that patient, or 
the most appropriate patient for a 
given treatment. Thus in the head- 
ings below we use the terms: treat- 
ment selection, therapist selection, 
and patient selection. 


Treatment Selection 


Ideally, the basic problem in prog- 
nosis is the assignment of patients to 
treatments in such a way as to maxi- 
mize the total ratio of improved to 


unimproved patients. In decision 
theory terms, the prognostic judg- 
ment is a case of decision-making 
under conditions of certainty, which 
implies that the relationships be- 
tween treatments and effects or out- 
comes are known. However, it has 
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not been demonstrated that different 
treatments have different effects. To 
quote an authority, 


One is reluctantly forced to admit that we 
simply do not possess the factual knowledge 
as of 1957 which permits us to say that we 
have any treatment procedure in psychiatry 
which promises a better outlook for a partic- 
ular illness than does nature left to her own 
devices (Hastings, 1958, p. 1057). (Quoted by 
permission of the American Journal of Psy- 
chiatry) 

Several attempts have been made 
to survey the literature on treatment 
effects, all of them hampered by the 
difficulties in comparing studies with 
different diagnostic groups, and 
different criteria for improvement. 
Eysenck (1952) selected 24 studies on 
the effect of psychotherapy with 
psychoneurotics, and concluded that 
these relatively homogeneous studies 
did not offer any evidence that im- 
provement rate for those receiving 
psychotherapy was greater than for 
those getting only custodial care. 
Methodological weaknesses in his 
survey were pointed out by Rosen- 
zweig (1954) and DeCharms, Levy, 
and Wertheimer (1954). 

Levitt (1957) surveyed 30 articles 
evaluating psychotherapy with chil- 
dren. He compared the improvement 
rate on discharge and follow-up for 
treated cases with that reported for 
children accepted for therapy who 
never appeared for a first interview. 
The results were similar to those 
found by Eysenck, and did not dem- 
onstrate any facilitation of recovery 
due to psychotherapy. 

Appel, Myers, and Scheflen (1953) 
summarized the results of studies 
which met a list of what they felt 
were minimal standards. They broke 
down the findings separately for 
schizophrenic, affective, and psycho- 
neurotic disorders. Their survey in- 
dicated that none of the treatments 
studied—insulin coma, electrocon- 
vulsive shock, electronarcosis, lobot- 
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omy, or psychotherapy—gave re- 
covery rates significantly greater 
than that reported for groups re- 
ceiving only routine hospital care, 
in any of the three disorder cate- 
gories. A more recent review by 
Staudt and Zubin (1957) covering 
the somatotherapies indicated that 
insulin and electroconvulsive shock 
temporarily increase the improve- 
ment rate, but after 3 years the in- 
crease has dissipated. This conclu- 
sion would seem to fly in the face of 
the fact that most of the studies re- 
viewed by Staudt and Zubin reported 
significantly greater recovery for the 
treated group than for the control 
group at all periods of follow-up. 
However, the groups were equally 
different before treatment was begun; 
in most instances the control groups 
“seem to be highly selected and 
loaded with patients of apparently 
poor prognosis. Their improvement 
rates fall far short of the ‘spontane- 
ous improvement rates’’’ (Zubin, 
1959, p. 344). This bias in selection 
of control groups is also likely to be 
operating in studies of psychotherapy 
unless matching procedures are pos- 
sible, since there seems to be a feeling 
in many clinics that ethical considera- 
tions make it mandatory that pa- 
tients who appear treatable be given 
treatment as quickly as possible. 
Kramer and Greenhouse (1959) 
discuss a point which bears directly 
on the adequacy of studies in this 
area. They show the statistical im- 
plications of the common sense no- 
tion that the less dramatic the effect 
one is looking for, the larger the 
sample necessary to show that it is 
significant. Their tables indicate 
that if one is interested in identifying 
in the experimental group as slight 
an improvement as 5% over the con- 
trol group (at the .05 level of signifi- 
cance) for base rate improvement of 
40% (which is close to that found in 
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schizophrenia) it would take at least 
569 cases in each group. For a base 
rate improvement of 70% (typical of 
the psychoneurotic) 472 cases per 
group would be needed to demon- 
strate a 5% increase under ideal con- 
ditions. These estimates further 
assume perfect reliability of the im- 
provement criterion. Kramer and 
Greenhouse point out that very few 
states have a large enough population 
of mentally ill to do a study with a 
sample sufficient to detect slight but 
significant effects. Thus all the 
studies on the effect of treatment 
using small samples implicitly assume 
no interest in detecting anything less 
than extremely large differences. 


This is why it has been emphasized 
that treatment effects seem to be 
negligible relative to other variables 
in determining outcome; in view of 
the size of samples for research in 
this area it would not be fair to say 
that slight treatment effects may not 


exist. 

How do patients regard psycho- 
therapy? Stotsky (1956a) found that 
only 10% of a VA sample mentioned 
psychotherapy when asked to list 
any treatments which helped them. 
If asked directly whether they felt 
psychotherapy was the most impor- 
tant part of their treatment, over 
50% said yes. These patients came 
predominantly from a lower socio- 
economic class which, as will be dis- 
cussed later, would bias the results 
in the direction of more negative 
answers. 

Two final points can be made. It 
first should be said that clear-cut 
effects of psychotherapy seem to 
have been demonstrated using the 
patient’s verbal behavior, rather 
than judgments of improvement, as 
the criterion measure (Rogers & 
Dymond, 1954; Rosenthal, 1955). 

Secondly, it might be pointed out 
that the inconclusive state of affairs 
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regarding the effects of treatment is 
not necessarily discouraging from the 
restricted point of view of the re- 
searcher. If treatment effects are 
currently less important than effects 
due to other sources of variance, 
then the researcher can ignore treat- 
ment differences in his samples and 
in the formulation of his hypotheses, 
thus considerably simplifying the re- 
search design. 

Therapist Selection 

A special aspect of treatment selec- 
tion is the question of what kind of 
therapist does best with what kind of 
patient in psychotherapy. In the 
years surveyed in this review the 
pertinent articles in this area dealt 
with such therapist variables as sex, 
vocational interests, professional affi- 
liation, and experience. 

Irrespective of cause, are there 
differences between therapists as to 
treatment results? Imber, Frank, 
Nash, Stone, and Gliedman (1957) 
compared three therapists, each of 
whom worked with 18 patients. No 
significant differences were found be- 
tween therapists, against a criterion 
of ratings of improvement in social 
effectiveness. Sullivan, Miller, and 
Smelser (1958) found neither sex, 
experience, nor profession (psychia- 
trist, psychologist, or social worker) 
to be related to either length of stay 
in therapy or to ratings of improve- 
ment. Hiler (1958a) reported signifi- 
cant differences in number of re- 
sponses on the Rorschach between 
six groups of patients (14 per group), 
each group subsequently treated by 
a different therapist. He interpreted 
this as indicating that the therapists 
differed in their ability to keep un- 
productive patients in therapy. 
Stieper and Wiener (1959) found 
significant differences between thera- 
pists as to the length of time they 
kept patients in therapy. The differ- 
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ences seemed to be related to per- 
sonality variables in the therapist, 
such as having high goals concerning 
very sick patients, and needing to 
feel appreciated. They took a nega- 
tive view toward this minority of 
therapists who keep patients in 
therapy for long periods: 

It seems to us likely that psychotherapeutic 
practice today contains self-defeating concepts 
which may not only be hampering to the suc- 


cess of treatment, but potentially harmful to 
its clients (p. 241). 


Betz and Whitehorn (1956) found 
differences in treatment between 
therapists who had a cumulatively 
high improvement rate with schizo- 
phrenics and therapists with a low 
improvement rate. The successful 
therapists were more active, em- 
phasized utilization of assets, un- 
derstood the meaning of the pa- 
tient’s behavior, and engendered 
more trust and confidence. They 


also differed from unsuccessful thera- 
pists in their scores on the Strong 


Vocational Interest Test. 

Myers and Auld (1955) found that 
the experienced staff in an out-pa- 
tient clinic had fewer patients quit 
against the therapist’s wishes, and 
more patients who improved, than 
the residents in the same clinic. Katz 
and Solomon (1958) concluded that 
in their sample the less experienced 
therapists tended to lose more pa- 
tients, but if the patient continued 
treatment, the improvement rate was 
as high as for the more experienced 
therapists. Strupp (1958) had 134 
residents and psychiatrists respond 
to a sound film of an initial interview. 
He interpreted his data as showing 
two types of therapists. Type I was 
positive in his feelings toward the pa- 
tient, optimistic about prognosis, and 
permissive and passive in therapy— 
and relatively inexperienced. Type 
II was more experienced, was nega- 
tive toward the patient, pessimistic 
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about prognosis, and active in ther- 
apy (giving orders and advice, and 
venting his irritations). Strupp 
quotes Kubie (1956) on reasons for 
this increasing pessimism: Kubie 
mentions his disappointment, saying 
it is one shared by other psychoana- 
lysts, to find that with increasing ex- 
perience he did not seem to have in- 
creasing success. 

Several studies (Katz, Lorr, & 
Rubinstein, 1958; Sullivan et al., 
1958) have reported that the more 
experienced the therapists, the larger 
the percentage of cases rated by him 
as improved; and the less severe the 
illness, the greater the likelihood of a 
patient’s having an _ experienced 
therapist. Clearly, it is advisable to 
control for severity of illness in re- 
search on therapy. Differences in 
socioeconomic level also appear to 
interact with experience. Schaffer 
and Myers (1954) studied all cases 
accepted for treatment in an out- 
patient clinic during 1 year and 
found that 


the higher a patient’s social class position . . . 
in the community, the greater were his chances 
of being accepted for psychotherapy, of being 
assigned to a relatively experienced therapist 
occupying a high status within the clinic, and 
of maintaining contact with the clinic (p. 88). 
(Quoted by permission of Psychiatry) 


It is apparently also likely (Winder 
& Hersko, 1955) that the higher the 
social position, the higher the likeli- 
hood that the therapist will decide on 
analytic rather than supportive pro- 
cedures. 

Since the above studies did not 
control for these contaminating 
factors, it must be concluded that 
demonstration of between-therapist 
effects on outcome has not been con- 
clusively obtained. This is not 
particularly surprising, in view of 
the fact that therapist selection is 
just a special case of treatment selec- 
tion. Again, though, it can be said 
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that effects can probably be shown, 
against other than improvement cri- 
teria. For instance, Rosenthal (1955) 
found that the amount of benefit a 
client said he obtained from therapy 
correlated .68 with the degree of 
shift in moral values toward those 
held by the therapist, if the values 
had been talked about during psycho- 
therapy. This change would appear 
to be related to those obtained in 
laboratory studies on verbal condi- 
tioning (Krasner, 1958). 


Patient Selection: Outcome Criteria 


We turn now to the question of the 
relationship of intra-individual vari- 
ables to prognostic criteria. The 
studies will be grouped along two di- 
mensions. They will be considered 
according to the kind of criterion 
used—outcome, duration, or course— 
and further broken down, where pos- 
sible, in terms of the type of test 
used—projective, questionnaire, or 
perforriance (including cognitive 
tests). 

Nontest indicators. Before turning 
to the research using tests as predic- 
tors of outcome, it is of interest to 
survey briefly what has been found 
using nontest variables. Huston and 
Pepernik (1958) reviewed prognostic 
variables in schizophrenia, and pre- 
sented evidence that only these vari- 
ables had been firmly established as 
going with favorable outcome: acute 
onset, short duration of illness prior 
to hospitalization, a precipitating 
stress, and the absence of flat or in- 
appropriate affect. A series of studies 
under the direction of Pascal in- 
vestigated the interrelationships of 
these variables within a sample of 
varied psychotics. It was found that 
acute onset (Swensen & Pascal, 
1954b) and aggression directed 
toward oneself (Feldman, Pascai, & 
Swensen, 1954) related significantly 
to favorable outcome when other 
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prognostic variables were controlled. 
However, precipitating stress (Cole, 
Swensen, & Pascal, 1954), affective 
expression (Bayard & Pascal, 1954), 
and duration of illness (Swensen & 
Pascal, 1954a) did not relate to out- 
come in their sample when the effect 
of other prognostic variables was held 
constant. The generality of their 
findings is not clear, since their 
method of balancing groups for con- 
trol purposes led to their using only a 
small portion of the total sample, 
thus allowing for the possible intro- 
duction of unknown biases. 

Eskey, Friedman, and Friedman 
(1957) could not find support for the 
notion that disorientation relates to 
duration of illness; however, they 
restricted their sample on the cri- 
terion variable by not using patients 
who were unimproved at discharge. 
Several studies (Eskey & Friedman, 
1958; Phillips, 1953) indicate that 
intact cognitive processes and a 
mature premorbid social and sexual 
life go with favorable outcome. Zubin 
(1959) presents the results to date of 
an uncompleted survey of prognostic 
indicators for schizophrenia, which 
suggests that the variables defining 
reactive schizophrenia go with favor- 
able prognosis, and those defining 
process schizophrenia go with un- 
favorable prognosis. He presents a 
valuable count of articles supporting 
or negating the postulated relation- 
ship for almost every if not every 
prognostic indicator that has been 
investigated. There have been sev- 
eral attempts to combine these vari- 
ables into a scale. Thorne (1952) 
intuitively combined five into a 
quantified prognostic scale. More 
recently Lindemann, Fairweather, 
Stone, and Smith (1959) have devel- 
oped a somewhat similar scale and 
cross-validated it against a criterion 
of duration of hospital stay. An 
eight-point scale (Schofield, Hatha- 
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way, Hastings, & Bell, 1954) devel- 
oped to predict a follow-up criterion 
of adjustment in schizophrenia could 
not be cross-validated by Stone 
(1959). Becker and McFarland 
(1955) developed and cross-validated 
a 16-item scale against a criterion of 
improvement in a lobotomized sam- 
ple. 

The above studies have dealt with 
psychotics, or samples predominantly 
psychotic. Miles, Barrabee, and 
Finesinger (1951) reported that in a 
hospitalized psychoneurotic sample, 
age of onset, duration of illness prior 
to hospitalization, and a number of 
symptoms were unrelated to out- 
come. Patients with symptoms asso- 
ciated with autonomic discharge were 
most likely to remit. Rosenbaum, 
Friedlander, and Kaplan (1956), 
studying an outpatient sample, found 
improvement occurred in patients 
with good premorbid history whose 
environment offered many supports; 
and improvement was mainly in 
marital and work adjustment. Com- 
parison of results on inpatient and 
outpatient samples suggests some 
reason for dealing separately with 
psychotics and psychoneurotics in 
prognosis research. 

An important question is how well 
the clinician, using these nontest 
indices, can do in predicting outcome. 
Clow (1953) obtained a majority 
opinion of prognosis at the staff con- 
ference which was held 2 months 
after admission on each of 100 female 
schizophrenics. The prognoses were 
73% correct in predicting a dichoto- 
mous improved-unimproved criterion 
obtained at discharge. More studies 
of this kind would be helpful in evalu- 
ating the practical usefulness of 
adding tests to current prognostic 
procedures. 

Projective tests. Several Ror- 
schach studies have used a configura- 
tional score, the Prognostic Rating 
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Scale (PRS) (Klopfer, Kirkner, Wi- 
sham, & Baker, 1951). Kirkner, 
Wisham, and Giedt (1953) found a 
correlation of .67 between PRS and 
improvement ratings obtained by 
evaluating the terminal closure note, 
on a sample of 40 receiving psycho- 
therapy. Mindess (1953) obtained a 
correlation of .66 (N of 70) between 
PRS and a diagnostic criterion run- 
ning from normal through neurotic to 
psychotic, obtained 6 months after 
initiation of psychotherapy. Filmer- 
Bennett (1952, 1955) did not obtain 
significant results with either the 
PRS or global judgments based on 
the total Rorschach protocol. His 
criterion was a dichotomous im- 
proved-unimproved rating of the 
degree to which the patient was 
making a satisfactory social and 
vocational adjustment a year after 
discharge from the hospital. Rosalind 
D. Cartwright (1958) presented a 
review of several successful studies 
using the PRS, and described further 
positive results from her own study. 
The criterion was ratings of success of 
psychotherapy made by the counselor 
after termination of therapy. In an 
appended discussion of her paper 
Snyder argued that other tests might 
do as good a job with much less time 
needed for testing. Bloom (1956) 
added an interesting modification to 
his design. He divided his 46 subjects 
into two groups, an unproductive 
group (less than 11 Rorschach re- 
sponses) and a productive group (11 
or more responses). The PRS differ- 
entiated a dichotomous criterion of 
outcome of psychotherapy signifi- 
cantly in the productive group, but 
not in the unproductive. He further 
assessed 11 other scores, and found 
none which were either significant’ or 
nonsignificant for the total sample asa 
whole; all discriminated significantly 
in one or the other of his groups—four 
for the productive group, and seven 
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for the unproductive. His results 
suggest the operation of an interac- 
tion similar to the one Zubin and 
co-workers (see below) have reported 
between chronicity and outcome, and 
deserve further investigation. 
Rogers and Hammond (1953) and 
Roberts (1954), both working with 
VA outpatients, tried a sign approach 
on the Rorschach with negative 
results. Dana (1954) hypothesized 
that Card IV, assumed to be most 
likely to pick up attitudes to author- 
ity, would give responses related to 
improvement in psychotherapy, if the 
authority relationship was crucial to 
outcome. The responses were placed 
in three categories—‘‘adequate,”’ “‘in- 
adequate,”’ ‘‘negative’’—and there 
was a significant tendency for those 
with ‘“adequate’’ response to im- 
prove, and those with “inadequate” 
responses to remain unimproved. 
Hammer (1953) felt that his review of 
the literature suggested that those 
patients whose Rorschach protocols 
look sicker than their H-T-P proto- 
cols have a good prognosis, while a 
poor prognosis is associated with 
giving more negative feelings on the 
H-T-P than on the Rorschach. 
Ullman ‘(1957) found two highly 
related measures—clinical judgments 
of TAT protocols and a social percep- 
tions test—to be correlated signifi- 
cantly with two criteria of improve- 
ment: the Palo Alto Group Therapy 
Scale and hospital status after 6 
months (hospitalized vs. discharged). 
S. Rosenberg (1954) developed and 
cross-validated eight prognostic signs 
based on the Wechsler-Bellevue, Sen- 
tence Completion, and on the Ror- 
schach. Grauer (1953) found more 
Rorschach indices of anxiety in an 
improved group of schizophrenics 
than in an unimproved. Organic 
signs did not discriminate. The 
welter of signs which these studies 
find related to improvement shows no 
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clear pattern. Obviously most of 
these positive findings with projective 
techniques should be further vali- 
dated before they can be accepted as 
more than promising leads. 

Questionnaires. Barron (1953b) 
reported lower pretherapy MMPI 
and Ethnocentrism scores for an 
improved outpatient group than for 
an unimproved group. The criterion 
was judgments of change in psycho- 
therapy made by professionals who 
had not been involved in the treat- 
ment. At least some of these relation- 
ships were due to differences in IQ 
between the groups. Rosen (1954) 
was not able to verify Barron’s find- 
ing with the E Scale. Barron devel- 
oped a special ego strength scale from 
the MMPI (Barron, 1953a), which he 
successfully cross-validated against 
improvement criteria in three dispa- 
rate samples. Wirt (1955, 1956) 
found the ego strength scale signifi- 
cantly discriminated an unimproved 
from a greatly improved group, the 
groups being extremes drawn from a 
hospitalized sample receiving psycho- 
therapy. The scale did a better job of 
discrimination than experienced clini- 
cians who based their judgments on 
the total MMPI profile. 

Feldman (1951, 1952, 1958) ex- 
plored the validity of the MMPI for 
the prediction of outcome after 
electroshock therapy. He found that 
items dealing with hostility and inter- 
personal relationships were predictive 
of outcome, while items dealing with 
symptomatology reflected the amount 
of improvement. Pumroy and Kogan 
(1958) were unable to cross-validate 
Feldman’s prognostic scale in a small 
VA sample. Dana (1954) also ob- 
tained negative results with the 
MMPI, attempting to predict im- 
provement after electroshock. 

Performance tests. Stotsky (1956b) 
gave vocational aptitude and interest 
tests to a group of schizophrenics 
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most of whom had been in the hospi- 
tal for a year or more. The aptitude 
tests predicted later work success, 
but the interest tests did not. Swen- 
sen and Pascal (1953) reported that 
the Pascal-Suttell Z score on the 
Bender-Gestalt test, was signifi- 
cantly lower for a group of inpatients 
judged to be improved on follow-up a 
year and a half later, than for those 
judged unimproved. Landis and 
Clausen (1955) found efficient per- 
formance on critical flicker fusion, 
reaction time, finger dexterity, audi- 
tory acuity threshold, and tapping 
speed was predictive of improvement 
in an inpatient sample receiving a 
variety of treatments. A variability 
score of palmar sweating (Ellsworth 
& Clark, 1957) predicted changes in a 
behavioral adjustment scale concur- 
rent with the administration of tran- 
quilizing drugs. Keehn (1955) took 
12 measures from simple cognitive 
and psychomotor tests that had been 
shown to discriminate between nor- 
mals and psychotics, and found only 
one score that predicted outcome in a 
group of inpatients receiving insulin 
coma therapy; he concluded that 
initial degree of psychoticism was not 
prognostic of outcome. 

Vinson (1952) used a mirror draw- 
ing test to predict the prognosis made 
at discharge—a dichotomous ‘‘favor- 
able-unfavorable” prognostic judg- 
ment made by the staff. His sample 
consisted of 18 hospitalized patients 
who received electroshock therapy. 
He tested before and during treat- 
ment, and the difference between 
these scores predicted the prognostic 
criterion at the .02 level of signifi- 
cance. 

The most promising findings made 
in prognosis in the last 10 years have 
been reports coming out of the 
Columbia-Greystone project of two 
interaction effects. The first interac- 
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tion dealt with the relation of chronic- 
ity to outcome. Windle and Hamwi 
(1953) reported that chronic patients 
who were discharged after treatment 
had poorer admission scores on a 
complex reaction time test than 
chronic patients who were not dis- 
charged. However, for acute pa- 
tients, those whose illness was of 
short duration, the reverse was true, 
namely, poor admission scores were 
associated with poor outcome. Zubin, 
Windle, and Hamwi (1953) rechecked 
data on other tests, using chronic 
patients fram the same study, and 
found four other tests which gave the 
same results. An independent valida- 
tion was provided by Sonder (1955) 
using different tests. In all of these 
studies the results were most clear- 
cut for the chronic group, probably 
due to the fact that among the acute 
patients were some who were poten- 
tially or actually chronic. 

The second interaction emerged 
from the study by Zubin, Windle, and 
Hamwi (1953) who found that the 
chronic patients who did well on 
conceptual tasks (intelligence, mem- 
ory, personality tests) but poorly on 
perceptual tasks (learning and per- 
ception tests) had a poorer prognosis 
than chronic patients who showed 
conceptual confusion but perceptual 
clarity. Williams and Machi (1957), 
also working with the chronic sample 
from the Columbia-Greystone proj- 
ect, factor analyzed the test data, 
and found some support for this 
conceptual-perceptual differentiation. 
However, this finding is not yet as 
clearly supported by the evidence as 
the chronicity-outcome interaction. 
Zubin and Windle (1954) reviewed a 
number of independent prognostic 
studies, and reported that a consid- 
eration of the two interaction effects 
accounted for much of the conflicting 
findings. In the light of this work, 
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further attempts to investigate these 
interactions cannot help but be of 
value. 


Patient Selection: Duration Criteria 


Projective tests. Stotsky (1952), 
working only with schizophrenics, 
compared a group of patients who in 
a 2-year period had not left the 
hospital with a group which in the 
same period of time had been dis- 
charged and remained outside for at 
least 6 months. His hypothesis was 
that the prognosis would be best for 
patients with the best pretreatment 
emotional and intellectual integra- 
tion. Of 19 Rorschach signs, 5 were 
significantly cross-validated in a sec- 
ond sample, Also, all of the 19 signs 
except’ R were found to be in the 
predicted direction in both samples. 

Questionnaires. Grayson and Olin- 
ger (1957), ina VA inpatient sample, 
reported that those who were given 
early trial visits were able to give 


improved MMPIs when asked to 
respond in “‘the way a typical, well- 
adjusted person on the outside would 
do” to a greater extent than those 


still hospitalized after 3 months. 
Rapaport (1958) was not able to 
validate this finding, using a military 
sample, although the change on most 
of the scales was in the correct direc- 
tion. Stieper and Wiener (1959) 
found a group of VA outpatients who 
were seen in psychotherapy for an 
average of 5.3 years had higher pre- 
therapy scores on the MMPI scales, 
Hs and Hy, than a group who were 
discharged after 14 months. 

A demographic study (Lindemann 
et al., 1959) found an index using 
marital status, diagnosis, degree of 
incapacity, legal competence, and 
alcohol intake as variables, was re- 
lated to length of hospital stay. Ells- 
worth and Clayton (1959) found a 
rating scale of psychopathology filled 
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out at admission did not correlate 
significantly with length of hospital 
stay, but a behavioral adjustment 
scale did correlate, patients with the 
best admission adjustment tending to 
remain in the hospital the shortest 
length of time. 

Performance tests. Venables and 
Tizard (1956b) found “short-stay”’ 
schizophrenics performed better on a 
repetitive psychomotor task than did 
chronic schizophrenics. Reaction 
time differences (Venables & Tizard, 
1956a) occurred on initial testing, but 
disappeared on retest. 

Patient Selection: Course Criteria 

Under criteria measuring the 
course of illness we have placed two 
broad questions: who will relapse, 
and who will terminate treatment. 

Relapse. The broad question here 
is one of predicting who will get worse 
over time. It is of course the reverse 
of the question of who will improve. 
However, the prediction of improve- 
ment and its opposite may not neces- 
sarily be most effectively accom- 
plished with the same test. It can not 
be assumed that the prediction of 
relapse or hospitalization can be 
made from the same tests which 
predict improvement. This is consist- 
ent with the assumption that change 


‘ of mental status need not be a unitary 


concept. 

Peterson (1954b) used the MMPI, 
Wechsler-Bellevue, Rorschach, and 
nontest data to predict who would 
require admission to the hospital 
from patients being seen on an out- 
patient basis in a VA mental hygiene 
clinic. Considering the base rates, the 
predictive power of the tests was 
slight, but the results suggested that 
the person who gets worse in therapy 
is single, has been previously hospi- 
talized, is diagnosed psychotic, and 
has an MMPI profile strongly ele- 
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vated on the psychotic scales. Using 
a six-point scale based on signs of 
psychosis on the MMPI developed by 
Meehl, Peterson (1954a) was able to 
achieve 75% correct discrimination. 
Briggs (1958) was able to cross- 
validate this scale to a certain extent. 
He took patients who were already in 
the hospital when they received the 
MMPI. On follow-up he found the 
Peterson score differentiated those 
who were rehospitalized from those 
who were not only for patients origi- 
nally diagnosed psychoneurosis or 
mixed psychoneurosis. This is con- 
sistent with Peterson’s finding that in 
his study similar outpatient diagnoses 
were most often given to the cases 
which were later hospitalized. 

Schofield and Briggs (1958) related 
several measures of improvement 
previous to initial discharge to rehos- 
pitalization, the median follow-up 
period being 5.8 years. Improvement 
in behavior ratings made by nurses 
was not related to rehospitalization, 
but a combination of ratings based on 
pre- and posttreatment MMPIs and 
psychiatric evaluations of improve- 
ment made at the time of discharge 
allowed 75% correct prediction for 
the 66% of cases on which the two 
ratings agreed. Since knowledge of 
the base rate alone would allow 66% 
correct prediction, this was only 
slightly better than chance. 

Cowden, Deabler, and Feamster 
(1955), using a criterion of whether 
patient was rehospitalized within 90 
days after discharge, reported judg- 
ments of change from admission to 
discharge on Sentence Completion 
and the H-T-P Test predicted the 
criterion. An “ego” score obtained 
from combining the Binet Vocabu- 
lary with Cards I, III, and VIII of 
the Rorschach predicted relapse 
within a 2-year period for a sample of 
discharge patients (Orr, Anderson, 
Martin, & Philpot, 1955), but did not 
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predict discharge for a sample of non- 
deteriorated admissions. Working 
with a special group (outpatients con- 
sidered interminable) Wiener (1959) 
studied return to psychotherapy over 
a 6-month period after initial psycho- 
therapy was arbitrarily terminated. 
In his sample of 48, 37 returned for 
further therapy within this period. 
The MMPI did not discriminate 
returnees from nonreturnees. Months 
in treatment appeared to be a promis- 
ing measure, with the returnees 
having a longer history of psycho- 
therapy. 

A study that fits under neither of 
our two course criteria is one by 
Rioch and Lubin (1959). They ob- 
tained lengthy follow-up data on 93 
patients, sufficient to allow an assess- 
ment on an 11-point scale of how 
consistently the patient had moved 
upward or downward in his social 
adjustment over several years. Both 
the Wechsler-Bellevue IQ and a 
global rating based on the Rorschach 
correlated significantly with this cri- 
terion, mainly due to discrimination 
at the low end of the scale: all of the 
patients who deteriorated steadily 
had low scores on the predictors. 

Termination of treatment. The 
criterion involved in the prediction of 
length of therapy is more objectively 
determined than improvement, but 
there are some difficulties in its deter- 
mination nonetheless. 

One question is how to measure 
length of therapy. Most studies have 
used the number of interviews as the 
measure. Number of weeks in treat- 
ment would appear to be an equiva- 
lent measure. However, Lorr, Katz, 
and Rubinstein (1958) found that the 
number of interviews correlated only 
.60 with number of weeks in treat- 
ment, and they argued that number 
of interviews is likely to be the less 
reliable of the two. 

Another problem springs from the 
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research design used in most of the 
studies of termination. The total 
sample is usually divided into two 
groups, terminators and remainers, 
and test scores are related to this 
dichotomous criterion. The question 
becomes one of where to cut the dis- 
tribution. Terminators have been 
defined as those remaining less than 4 
sessions (Gliedman, Stone, Frank, 
Nash, & Imber, 1957), less than 10 
sessions (Auld & Eron, 1953; Kotkov 
& Meadows, 1953), or less than 20 
sessions (Gibby, Stotsky, Hiler, & 
Miller, 1954). Gibby et al. (1954) 
found that those terminating between 
5-19 sessions resembled in their test 
responses those who terminated ear- 
lier rather than those continuing on 
for more than 19 sessions. Our previ- 
ous discussion of the “failure zone”’ 
(Taylor, 1956) suggests that a variety 
of factors are operating in the first 20 
weeks. When these factors have not 
been controlled, they can influence 
the findings in termination studies. 

A further criticism has been made 
by Gundlach and Geller (1958) who 
suggest that termination rate and 
duration of illness are partly adminis- 
trative artifacts, and partly a reflec- 
tion of “the kind of personality 
problems that the staff are interested 
in, or skilled at, handling.’’ This 
criticism can be taken as indirect 
support for the common practice of 
defining termination in terms of the 
distribution of the length of therapy 
measures, since in any given setting, 
the median or mean length takes 
some account of the effects of policy 
and staff interests. 

Research on the prediction of ter- 
mination by the use of projective tests 
shows a familiar, monotonous pat- 
tern: initial positive results with 
subsequent negative or indeterminate 
cross-validation. Kotkov and Mead- 
ow (1952, 1953) began with 12 
formal scores, and validated one of 
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these (FC/CF). They applied a 
formula based on three scores (FC 
/CF, R, D%) to another sample, and 
D% washed out. When these same 
signs are examined in an earlier study 
(Rogers, Knauss, & Hammond, 
1951), none were significant, and only 
R was in the predicted direction. 
Auld and Eron (1953) tried a further 
validation of the Kotkov and Mead- 
ow formula, and obtained insignif- 
icant results. They found the Wechs- 
ler-Bellevue 1Q accounted for the one 
Rorschach variable, R, which held up 
in their sample. 

Starting anew, Gibby et al. (1954; 
Gibby, Stotsky, Miller, & Hiler, 
1953) found 9 of 31 Rorschach signs 
promising. Taking the 9 to a second 
sample, 3 held up (R, K, m) and a 
predictive formula based on these 
variables was applied to a further 
independent sample, and afforded 
68% correct prediction. However, 
knowledge of the base rate would 
have allowed 60% correct prediction, 
so the results were not strong enough 
to be of practical use. In their sample 
the Kotkov and Meadow formula did 
no better than chance, and IQ was 
not related to the criterion. Affleck 
and Mednick (1959) used an equation 
based on R, M, and H to predict who 
would remain for longer than three 
interviews. Their equation allowed 
71% correct prediction in a valida- 
tion sample. Their terminators were 
lower in IQ than the continuers (sig- 
nificant at .06 level). This is consist- 
ent with the findings of Auld and 
Eron (1953). 

All of the above Rorschach studies 
except for Auld and Eron used equiv- 
alent VA males being seen on an out- 
patient basis, so in some respects 
sample homogeneity was better from 
study to study than is true of most 
validation research in this area. Of 
all the Rorschach signs only R seems 
to have maintained its promise in 
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these studies. More recent work 
(Gallagher, 1953, 1954; Taulbee, 
1958) supports the conclusion that 
the number of Rorschach responses 
(R) relates to termination. However 
the Rorschach is probably an unnec- 
essarily cumbersome way of measur- 
ing this variable; for instance, Gal- 
lagher (1954) found that the number 
of words used on the Mooney Prob- 
lem Check List to describe the cli- 
ents’ problems was a better predictor 
than R. 

Libo (1957) used a TAT-type test 
to predict the number of patients who 
would return the week after the test 
was administered. For 40 subjects he 
was able to make a significant predic- 
tion based on an ‘‘attraction score’’: 
the number of references in the stories 
to a desired move toward the thera- 
pist, or of anticipated satisfactions 
from therapy. 

Three studies dealt with the predic- 
tion of termination in a tuberculosis 
hospital. Vernier, Whiting, and 


Meltzer (1955) were able to differ- 
entiate patients who left the hospital 
against medical advice from those 
who continued treatment to the end, 
using the Rorschach and H-T-P tests. 


The TAT did not discriminate. 
Moran, Fairweather, and Morton 
(1956), using a biographical inven- 
tory and an attitude questionnaire 
found that only prehospital life 
adjustment predicted who would 
leave the hospital prematurely, with 
those leaving having a long history of 
being unable to adjust to their life 
situations. Calden, Thurston, Stew- 
art, and Vineberg (1955) developed 
and cross-validated a scale from the 
MMPI to predict premature dis- 
charge. 

Taulbee (1958) developed a key 
based on the MMPI and the Ror- 
schach to predict continuation of 
outpatient psychotherapy beyond the 
thirteenth interview. His results, not 
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cross-validated, led him to conclude 
that those who continue in therapy 
are less defensive, and more persist- 
ent, dependent, anxious, and intro- 
spective than the terminators. Sulli- 
van et al. (1958) reported no signifi- 
cant difference between MMPI scores 
of terminators, and continuers on a 
VA male sample. Of a number of 
variables only education and occupa- 
tion related to the criterion. Conrad 
(1954) had therapists fill out a check 
list covering positive mental health, 
social conformity, and behavior pa- 
thology on VA outpatients with 
differing lengths of stay in psycho- 
therapy. Continuers tended to look 
least disturbed initially, and to be at 
the median rather than at either 
extreme on social conformity. 

Rubinstein and Lorr (1956) found 
differences between extreme groups 
(patients in psychotherapy for over 6 
months vs. patients who had come 
less than six times and had termi- 
nated against the wishes of the thera- 
pist), on the authoritarian F Scale, 
and a vocabulary test. However, a 
later study (Lorr et al., 1958) which 
defined termination as having less 
than 7 weeks of psychotherapy, did 
not give significant results, though 
the scales were in the predicted direc- 
tion. They combined a number of 
scales in a further attempt, and 
obtained a significant multiple corre- 
lation in a validation sample. How- 
ever, the scales allowed no better 
prediction than interviewer's judg- 
ment. 

A large recent project on termina- 
tion was carried on at Johns Hopkins 
University (Frank, Gliedman, Imber, 
Nash, & Stone, 1957; Gliedman et al., 
1957; Imber, Frank, Gliedman, Nash, 
& Stone, 1956; Imber, Nash, & Stone, 
1955; Nash, Frank, Gliedman, Imber, 
& Stone, 1957). Their prognostic 
battery included an inventory and a 
Sway test. Those who stayed in 
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therapy more than three interviews 
were more suggestible on the Sway 
test, were more sociable, of higher 
socio-economic status, and more likely 
to see treatment as a means of main- 
taining status in their immediate so- 
cial environment, than the termina- 
tors. When they compared group 
versus individual psychotherapy they 
found an interaction between. treat- 
ment and termination: in group ther- 
apy, the terminators were more so- 
cially ineffective than the continuers, 
while the relationship was reversed 
for those getting individual therapy. 
This intriguing finding may have been 
related to an unequal distribution of 
social levels in the two groups—most 
of the lower class patients ended up in 
group psychotherapy, while most of 
the middle class patients were as- 
signed to individual psychotherapy. 

Hiler (1959) studied intial com- 
plaints, and concluded that continu- 
ers come to a clinic with typical psy- 
choneurotic symptoms—obsessions, 
phobias, anxiety, depression, poor 
concentration—while early termina- 
tors are more likely to list purely 
organic symptoms, antisocial acts, or 
schizoid feelings. His continuers also 
obtained higher scores on the Wechs- 
ler-Bellevue with a subtest pattern 
characterized by Similarities being 
higher than Digit Span or Digit 
Symbol (Hiler, 1958b). 

How much overlap is there be- 
tween predictors of termination and 
improvement? Sullivan et al. (1958) 
investigated the relationship of 
MMPI scores and demographic vari- 
ables to both improvement and ter- 
mination criteria. Only occupational 
level was related significantly to both. 
Katz et al. (1958) found none of their 
predictors of length of stay correlated 
with therapist ratings of improve- 
ment. Frank et al. (1957) reported 
that a past history of social activity 
and a fluctuating course of illness was 
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associated with continuation and 
improvement. A short duration of 
illness was associated with termina- 
tion as well as improvement. Gal- 
lagher (1954) found the Taylor Mani- 
fest Anxiety Scale predicted contin- 
uation as well as improvement. In 
general the results suggest little over- 
lap. This is somewhat unexpected, 
since as was mentioned earlier, there 
appears to be a positive relationship 
between criteria of duration of treat- 
ment and improvement. The most 
tenable assumption would seem to be 
that the variance shared by the two 
criteria is different from the variance 
shared by predictor and criterion. 
Possibly the correlation between 
criteria is due to rater bias. 


DISCUSSION 


The previous sections of this paper 
have included the word “‘selection’”’ in 
order to underline the fact that the 
practical need to predict to any of 
these criteria exists only when some 
sort of selection is necessary. For 
example, if the waiting list of an 
outpatient clinic is too long, selection 
of cases to receive treatment can be 
made on the basis of predicted prob- 
ability of improving or terminating. 
If there is no need to deny treatment 
to anyone, knowledge of these prog- 
nostic probabilities is of no practical 
use. In most mental treatment cen- 
ters today administrative procedures 
probably do not involve rejection of 
the patient as an alternative action, 
except in some outpatient clinics. 
Prognosis would be indispensable in 
the question of treatment selection, if 
differential effects of treatment were 
known; our survey has suggested that 
such effects have not yet been dem- 
onstrated. Thus it could be argued 
that prognosis is a sleeping giant at 
the present time, awaiting a future 
chance to be of service. Several other 
uses can be made of prognostic infor- 
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mation, of course. Knowledge of the 
variables which relate to changes in 
duration, course, or outcome of 
mental illness is of theoretical impor- 
tance, an aid to understanding. A 
second promising use has been pro- 
posed by Feldman (1952) and Zubin 
(1959). They recommend that in 
nonprognostic research prognostic 
status be tried as a method of classi- 
fying patients into homogeneous 
categories, in place of diagnosis. 

Is such a suggestion tantamount to 
substituting a measure of severity of 
illness for one of type of illness? The 
literature survey indicates a wide 
variety of tests have shown positive 
results, with no discernible common 
characteristic except that they meas- 
ure adequacy of functioning, directly 
or indirectly. The fact that the same 
measures do not predict for all pa- 
tients may be due to differences in the 
type and etiology of symptoms from 
patient to patient; but such differ- 
ences do not vitiate the possibility 
that when prediction occurs it is 
largely because the dimension of 
severity of illness has been accurately 
assessed by the test. In any case, the 
effect of matching groups on prog- 
nostic variables would be to control 
for base rate differences in improve- 
ment, a procedure which is impera- 
tive for many kinds of evaluational 
studies, though rarely invoked in 
research on therapy. 

As with all predictive questions, 
the primary problem in prognosis is 
the definition of the criterion. From 
the point of view of decision theory, 
the general notion of “outcome of 
illness’” involves assigning utility 
values to specific outcomes; and since 
cost of achieving any given outcome 
may be a factor, an explication of the 
treatment strategies is also necessary. 
The low interjudge reliabilities which 
obtain in judgments of improvement 
indicate that utility of outcome may 
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differ from judge to judge. A pro- 
gram for achieving a more objective 
ranking of treatments, outcomes, 
or treatment-outcome combinations 
seems called for. Cronbach and 
Gleser (1957) offer a possible frame- 
work for such a program, and most of 
the points they make, although deal- 
ing with personnel selection, can be 
easily generalized to prognosis. 

A frequent misinterpretation of 
empirical research is that it is based 
on no theory. In the sense of a con- 
tent theory—i.e., a theory stating 
relationships between tangibles or 
concepts related to tangibles—em- 
pirical research is usually weak, 
though in the selection of measures 
some sort of rough theory has to 
be involved. However, empirical re- 
search often is strongly tied to a 
mathematical model. In prognosis 
the guiding model has been the linear 
regression model. The studies have 
assumed that a measurable quality 
exists which is linearly related to out- 
comie. The findings in respect to per- 
formance differences between acute 
and chronic patients (Burdock et al., 
1958) suggest that this linear model 
probably will have to take account 
of interaction effects. If so, almost 
all studies to date are too simple in 
design. They involve a one-stage de- 
cision: look at one final score per per- 
son (the final score may of course be 
a combination of several subsidiary 
scores) and assign the patient to an 
outcome (criterion category) by 
whatever rule of operation is being 
applied to the score. The work of 
Zubin’s group indicates that at least 
a two-stage decision process is 
needed: (a) a score is obtained to 
decide which of several operations 
will be applied to a second score, 
and (b) the second score is used to 
assign patients to the criterion cate- 
gory. Indeed there is no reason why 
tests should not be useful as a basis 
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for deciding what operational rule 
to apply to other data. The variables 
which appear to have the strongest 
relationship to outcome have been 
nontest variables: severity and dura- 
tion of illness, acuteness of onset, 
degree of precipitating stress, etc. 
A possible direction of research 
might be to use tests to increase the 
validity of the nontest variables, 
either by trying to find tests which 
tap interactions, or which correlate 
with the error term in the psychiatric 
predictor. This latter approach has 


not been tried in prognosis, but it has 
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been used with some success in per- 
sonnel selection (Fulkerson, 1959; 
Ghiselli, 1956). A third suggested 
avenue of research would be to apply 
nonlinear or configurational models 
to prognostic data. The general point 
to be made is that prognosis research 
seems to require a different, more 
complex, mathematical model, and 
thus a more complex research design, 
than has been generally used so far. 
Specifically the one-stage design, 
where a predictor is correlated with 
an outcome measure, would appear to 
be inadequate in this field. 
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COMPLEX SOUNDS AND CRITICAL BANDS! 


BERTRAM SCHARF 
Northeastern University 


Studies of the responses of human 
observers to bands of noise and other 
complex sounds have led to the meas- 
ure of what appears to be a basic unit 
of hearing, the critical band. When 
the frequency spectrum of a stimu- 
lating sound is narrower than the 
critical band, the ear reacts one way; 
when the spectrum is wider, it reacts 
another way. For example, experi- 
ments show that at values less than 
the critical bandwidth, both loud- 
ness and absolute threshold are in- 
dependent of bandwidth; only when 
the critical bandwidth is exceeded 
do the loudness and the absolute 
threshold increase with the width 
(Gissler, 1954; Zwicker & Feldtkel- 
ler, 1955; Zwicker, Flottorp, & 
Stevens, 1957). 

The critical band has also been 
measured in experiments on auditory 
discriminations that seem to depend 
upon phase (Zwicker, 1952) and in 
experiments on the masking of a 
narrow band of noise by two tones 
(Zwicker, 1954). In all four types of 
experiment—loudness, threshold, 
sensitivity to phase, and two-tone 
masking—the value of the critical 
band is the same function of its 
center frequency. The values of the 
critical band, as a function of the fre- 
quency at the center of the band, are 


1 This paper was completed while the author 
held a research grant, B2223, from the Na- 
tional Institute of Health, United States Pub- 
lic Health Service. The author wishes to 
thank S. S. Stevens for detailed criticism and 
advice in the final preparation of the manu- 
script. A. B. Warren and H. S. Zamansky are 
also thanked for reading and making sugges- 
tions about the final draft. 


given by the top curve in Figure 1. 
The ordinate gives the width (AF), 
in cycles per second, of the critical 
band; the abscissa gives the center 
frequency. As the frequency at the 
center of a complex sound increases, 
the critical band that is measured 
around the center frequency becomes 
wider. 

Not only does the critical band 
have the same values when measured 
for several kinds of auditory response, 
it is also independent of such stimu- 
lus parameters as the number of 
components in the complex (Scharf, 
1959b) and the sound pressure level 
(Feldtkeller, 1955; Feldtkeller & 
Zwicker, 1956). 

Prior to the experimental measures 
of the critical band, Fletcher (1940) 
had hypothesized the existence of a 
critical band for masking. He sug- 
gested that when a white noise just 
masks a tone, only a relatively nar- 
row band of frequencies surrounding 
the tone does the masking, energy 
outside the band contributing little 
or nothing. Although attempts to 
test this hypothesis remain incon- 
clusive, investigators (Bilger & Hirsh, 
1956; Hawkins & Stevens, 1950) have 
been able to calculate values for the 
width of these hypothetical masking 
bands by assuming that the masking 
band and the just-masked tone have 
The calculated 
values, which are labeled ‘“‘critical 
ratios” in Figure 1, are smaller for the 
masking band than for the critical 
band as measured in the experiments 
cited above. As we shall see, this dis- 
crepancy is more apparent than real. 
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the same intensity. 
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Fic. 1. The width, AF, of the critical band 
and of the critical ratio as a function of the 
frequency at the center of the band. (The or- 
dinate gives the width, in cycles per second, 
of the critical band—and of the critical ratio— 
for the center frequencies shown on the ab- 
scissa. The top curve gives the values for the 
critical band which are based upon direct 
measurements in four types of experiment; 
the bottom curve gives the values for the 
critical ratio which are calculated from meas- 
urements of the masked threshold for pure 
tones in white noise. The points on the bottom 
curve are from Hawkins and Stevens—1950. 
This figure is adapted from an article by 
Zwicker, Flottorp, and Stevens—1957, p. 556 
—which contains also a table of critical-band 
values.) (Adapted with permission of the 
Journal of the Acoustical Society of America) 


EXPERIMENTAL MEASURES OF THE 
CRITICAL BAND 


Four types of experiment in which 
critical bands have been measured 
are reviewed: absolute threshold of 
complex sounds, masking of a band 
of noise by two tones, sensitivity to 
phase differences, and loudness. 


Threshold of Complex Sounds 


When two tones, whose frequencies 
are not too far apart, are presented 
simultaneously, a subject may report 
hearing a sound even though either 
tone by itself is below threshold. 
Gissler (1954) made careful meas- 


ures of this phenomenon, using many 
tones and systematically varying the 
difference in frequency, AF, between 
the lowest and highest components of 
the complex sounds.? He varied the 
AF by varying the number of equally 
intense tones, Which were spaced at 
intervals of 20 cps. The number of 
tones was increased from 1 to 40 or 
until AF was equal to 780 cps. Each 
time a tone was added, the threshold 
for the whole complex was measured 
by a “tracking’’ method (Stevens, 
1958). It was necessary, of course, 
that all the tones in the complex have 
the same threshold when heard 
singly, for otherwise it would have 
been impossible to determine the 
precise cause of a change in the 
threshold for a complex whose AF 
had been increased by the addition 
of a tone. Thus measurements were 
restricted to portions of the fre- 
quency spectrum over which a sub- 
ject’s threshold curve was flat. In 
order to study other portions of the 
spectrum, the multitone complexes 
were presented against a background 
of white noise that had been tailored 
to raise the threshold for tones at all 
the audible frequencies to the same 
level, thus artificially flattening a 
subject’s threshold curve. 

Whether the background was 
quiet, or consisted of a noise at 0 db. 
SPL, at 20 db., or at 40 db., the 
same effect was noted: as soon as 
AF exceeded a particular value whose 
size depended upon the frequency at 
the center of the complex, the thres' 
old for the multitone complex begau 
to increase. Similar data were re- 
ported when bands of white noise 
were substituted for the multitone 


2? Two or more tones constitute a complex 
sound, i.e., a sound with energy at more than 
one frequency in contrast to a single or pure 
tone with most of its energy concentrated at a 
single frequency. 
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complexes. The results indicate that 
the total energy necessary for a sound 
to be heard remains constant so long 
as the energy is contained within a 
limiting bandwidth. Although differ- 
ences between the two observers in 
these experiments were sometimes of 
the order of 40%, the average size of 
the limiting bandwidths for both 
multitone complexes and bands of 
noise is approximated by the critical- 
band curve of Figure 1.' 


Two-Tone Masking 


The masking of a narrow-band 
noise by two tones provided a second 
measure of the critical band. Using 
a tracking method, Zwicker (1954) 
measured the threshold of a narrow- 
band noise in the presence of two 
tones, one on either side of the noise. 
Increasing the difference in fre- 
quency, AF, between the two tones 
left the masked threshold for the 
noise unchanged until a critical AF 
was reached, whereupon the thresh- 


old fell sharply and, in general, con- 
tinued to fall as AF was increased 
further. The two subjects who served 
in this experiment showed the same 
drop in threshold at approximately 
the same AF for a given center fre- 
quency regardless of the SPL of the 


masking tones. The critical-band 
curve of Figure 1 gives the approxi- 
mate values of AF at which the mask- 
ing effect of two tones is sharply re- 


duced. 


3 Gissler (1954) measured a critical band of 
165 cps at 1000 cps. Garner (1947) had writ- 
ten earlier that “the best estimate . . . is that 
a band of frequencies no wider than 175 cps 
around 1000 cps is necessary if temporal in- 
tegration of acoustic energy is to be perfect” 
(p. 813). His estimate was based upon meas- 
urements of the threshold changes for a wide- 
band noise, an unfiltered 1000-cycle tone, and 
a filtered 1000-cycle tone as a function of 
bandwidth which was varied by varying the 
duration of the signal. 
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Sensitivity to Phase 


The critical band is also relevant 
to phase sensitivity, measured by a 
comparison between the ear’s ability 
to detect amplitude modulation 
(AM) and its ability to detect fre- 
quency modulation (FM). This pro- 
cedure requires some explanation. 

When the amplitude of a tone is 
modulated—i.e., alternately  in- 
creased and decreased—a three-tone 
complex is produced with the original 
tone (the “carrier’’) at the center of 
the complex and a tone on either 
side (side bands). When the frequency 
of a tone is modulated over a narrow 
range, a three-tone complex is also 
produced.‘ The only important dif- 
ference between the three-tone com- 
plex that is produced under AM and 
the complex that is produced under 
FM concerns the phase relations 
among the components. Conse- 
quently, any difference in the ear’s 
sensitivity to AM and FM would 
presumably depend upon these phase 
relations. 

Zwicker (1952) found, indeed, that 
in order for a subject to just hear a 
difference between a modulated and 
a pure, unmodulated tone, a smaller 
amount of AM is required than FM. 
The ear is more sensitive to AM than 
to FM, however, only at low rates of 
modulation. As the rate of modula- 
tion is increased, the difference in 
sensitivity to AM and FM gradually 
disappears. How do these results 
pertain to the critical band? The rate 
at which a tone is modulated deter- 
mines the frequency separation, AF, 
between the side bands of the three- 
tone complex produced under the 
modulation. It turns out that the 
rate of modulation at which AM and 


4 For a lucid discussion of the intricacies of 
modulation, consult Stevens and Davis (1938, 
pp. 225-231). 
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Fic. 2. The loudness level of a band of 
noise centered at 1000 cps measured as a 
function of the width of the band. (The 
parameter is the effective SPL of the noise. 
The dashed line shows that the bandwidth at 
which loudness begins to increase is the same 
at all the levels tested. This figure is adapted 
from the book, Das Ohr als Nachrichienem- 
pfinger, by Feldtkeller and Zwicker—1956, 
p. 82.) (Adapted with permission of S. Hirzel 
Verlag) 


FM become equally difficult to de- 
tect corresponds to values of AF that 
are essentially the same as the cri- 
tical-band values given in Figure 1. 
Zwicker’s investigation showed, 
moreover, that the critical band de- 
termined by phase sensitivity is in- 
dependent of the SPL of the modu- 
lated tone and varies only as a func- 
tion of the frequency of the ‘‘carrier”’ 
which lies, of course, at the center of 
the band. 

Since the complexes produced un- 
der AM and those produced under 
FM differ primarily with respect to 
phase relations, the ear may be able 
to detect AM more easily than FM 
at low rates of modulation because it 
is more sensitive to the kind of phase 
relations that occur under AM. The 
ear seems to be sensitive to the 
phase relations, however, only when 
the AF of the complex is less than a 
critical band. When AF is greater 
than a critical band, there is no dif- 


ference in sensitivity to AM and 
FM, implying that, beyond the 
critical band, the phase relations 
within the complex no longer serve 
as a significant cue in the detection 
of modulation. 


Loudness of Complex Sounds 


The critical band has been meas- 
ured most thoroughly in studies of 
the loudness of complex sounds as a 
function of bandwidth. Zwicker and 
Feldtkeller (1955) demonstrated that 
the loudness of a white noise is in- 
dependent of bandwidth until the 
critical band is exceeded, whereupon 
the loudness begins to increase. 
Their procedure was straightfor- 
ward. They presented a band of 
filtered white noise and a comparison 
tone alternately through a single ear- 
phone. The subject adjusted the in- 
tensity of the tone until the tone and 
the noise sounded equally loud. The 
overall SPL of the noise was held 
constant; only the bandwidth was 
varied from judgment to judgment. 
(Zwicker and Feldtkeller did not re- 
port the number of subjects or the 
amount of variability; probably only 
a few, well-trained subjects were 
used and the variability was small.) 
Figure 2 shows what happens to the 
loudness of a band of noise when its 
width is increased. These curves are 
for bands centered at 1000 cps, 
which was the geometric mean of 
the two half-power points. At all 
the SPLs tested, from 30 to 80 db., 
the loudness of the noise remains 
constant and the curve is flat up toa 
bandwidth of about 160 cps, where- 
upon the loudness begins to increase. 
Within the critical band, the noises 
are as loud as a tone of equal inten- 
sity, having the same frequency as 
the center of the band. Functions 
similar in shape to those in Figure 2 
were generated for bands centered at 
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500, 2000, and 4000 cps. The band- 
width at which loudness begins to 
increase defines the critical band for 
loudness, which was found to have 
approximately the same values as 
had been measured for threshold, 
two-tone masking, and phase sensi- 
tivity (see Figure 1). 

Zwicker and Feldtkeller studied 
continuous spectra, i.e., noises that 
have energy at every frequency be- 
tween the cutoff points. Bauch 
(1956) studied line spectra, i.e., 
sounds that have energy at two or 
more separate frequencies. He 
measured the loudness of three-tone 
complexes, produced by amplitude 
modulation, as a function of the 
difference, AF, in cps between the 
lowest and highest components of the 
complex. Bauch obtained the same 
results with three-tone complexes 
centered at various frequencies as 
Zwicker and Feldtkeller had ob- 
tained with bands of noise. For 
values of AF less than a critical band, 
loudness is constant except when 
AF is so small that beats are heard. 
The loudness begins to increase as a 
function of AF only when AF exceeds 
the critical band. 

At the time that the critical band 
was being mapped out in Germany 
at the Technischen Hochschule Stutt- 
gart (Bauch, 1956; Gissler, 1954; 
Zwicker, 1952, 1954; Zwicker & 
Feldtkeller, 1955) some of us at the 
Psycho-Acoustic Laboratory at Har- 
vard were puzzled by our failure to 
find an increase in the loudness of a 
four-tone complex as a function of 
AF. We had assumed that loudness 
summation begins as soon as AF is 
increased. We were, however, study- 
ing four-tone complexes whose AFs 
were smaller than a critical band. 
When reports of the critical band 
came from Germany, our results be- 
gan to make sense and, indeed, agreed 
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Fic. 3. The dependence of the loudness of a 
four-tone complex, centered at 1000 cps, on 
spacing and level. (Each point represents the 
median of two judgments by each of 10 listen- 
ers. The symbol T means the comparison tone 
was adjusted; C means the complex was ad- 
justed. This figure is from Zwicker, Flottorp 
and Stevens—1957, p. 550.) (Reproduced 
with permission of the Journal of the Acoustical 
Soctety of America) 


well with those being obtained across 
the sea. The experiments were con- 
tinued at Harvard by S. S. Stevens 
with G. Flottorp from Norway and 
E. Zwicker from Germany (Zwicker 
et al., 1957). Four-tone complexes 
and bands of white noise, at various 
center frequencies and various SPLs, 
were studied. In these experiments, 
16 to 22 untrained subjects some- 
times adjusted the complex sound 
and sometimes adjusted the compari- 
son until the two were equally loud. 
Figure 3 shows a typical set of results, 
those for four-tonecomplexes centered 
at 1000 cps. Each point is the median 
of 20 loudness matches. Although the 
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subjects were somewhat variable in 
their judgments, the medians are 
orderly and the lines through the 
data show a break at approximately 
the same value of AF that had been 
measured in Germany. The critical 
band made the transatlantic journey 
safely and invariantly. 

Another investigation carried out 
at Harvard (Scharf, 1959a) showed 
that at low levels, between 5 and 35 
db. above threshold, where the loud- 
ness of a complex sound increases 
more slowly with bandwidth than at 
higher levels, the critical band must 
be exceeded before loudness begins 
to change as a function of band- 
width. 

Niese (1960), in Dresden, has also 
studied loudness summation and the 
critical band. He presented the 
sound stimuli not only through ear- 
phones (as in all the previous experi- 
ments) but also through a _ loud- 
speaker in a free field, i.e., in an 
anechoic room where sounds are al- 


most completely absorbed by spe- 
cially constructed walls. The results 
for free-field listening are similar to 
those for earphone listening; the 
loudness of a band of white noise be- 


gins to increase with bandwidth 
when the critical band is exceeded. 
Niese found, however, that the loud- 
ness did not continue to increase in- 
definitely with bandwidth, but in- 
creased about 8 db. and then re- 
mained constant for bandwidths 
greater than 1000 to 5000 cps de- 
pending upon the center frequency. 
It may be that the loudness did not 
increase further because the avail- 
able energy was spread to very low 
and very high frequencies which con- 
tributed little to the total loudness. 

In other experiments, Niese (1960) 
tested the assumption that loudness 
summation is a peripheral process 
occurring independently in each ear. 
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In one procedure, a band of white 
noise was divided in half at its center 
frequency; the upper half was pre- 
sented through an earphone to one 
ear and the lower half to the other 
ear. The loudness of the noise in 
both ears did not begin to increase 
with bandwidth until the overall 
width exceeded a value approxi- 
mately twice the critical band, i.e., 
until the noise in each ear was wider 
than a single critical band. In a sec- 
ond procedure, two narrow bands, 
each 100 cycles wide, were first pre- 
sented together to one ear and later 
separately to each ear. When pre- 
sented together to a single ear, the 
loudness of the two bands increased 
with the frequency separation be- 
tween them. When, on the other 
hand, one band was presented to 
each ear, the loudness did not in- 
crease with the frequency separa- 
tion, no matter how great it was. 
The loudness did not increase be- 
cause the band of noise presented to 
each ear was never wider than a 
critical band; it was always 100 
cycles wide. Loudness summation 
thus seems to depend only upon the 
distribution of energy in one ear, 
suggesting that summation takes 
place not at some higher level in the 
auditory system where nerve im- 
pulses from the two ears join, but at 
the periphery, probably in the inner 
ear. 

Still another aspect of loudness 
summation has been recently in- 
vestigated (Scharf, 1959b). The re- 
sults indicate that the loudness of a 
complex sound remains essentially 
unchanged when only the number of 
components in the complex is varied. 
The loudness of the complex increases 
with AF when AF is greater than a 
critical band, but at any given value 
of AF the loudness is approximately 
invariant with the number of com- 
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ponents, provided the overall sound 
pressure remains invariant. 

The several experiments in loud- 
ness summation, along with those on 
threshold, two-tone masking, and 
phase sensitivity provide a firm body 
of evidence for the critical band. 
There remains, however, the ques- 
tion of the role of the critical band in 
the masking of pure tones by white 
noise. 


MASKING BANDS 


Although the empirical measures 
of the critical band are quite recent, 
the concept of a critical band was 
expounded some 20 years ago by 
Fletcher (1940) when he hypoth- 
esized that: (@) a pure tone that is 
masked by a white noise is in effect 
masked only by a narrow band of 
frequencies surrounding the tone, 
and (}) the intensity of the part of 
the band that does the masking is 
equal to the intensity of the tone. 

Fletcher (1940) presented some 


preliminary experimental results to 
support his thesis, but the projected 
full-scale experiment has apparently 
not been reported. Nonetheless the 
concept of a critical band has be- 
come important in theories about 


masking. Moreover, the acceptance 
of Fletcher’s hypotheses permits the 
calculation of values for the masking 
band from the measurement of the 
masking of pure tones by white noise 
(Hawkins & Stevens, 1950). The 
calculated values for the masking 
band turn out to be about two-and- 
one-half times smaller than the em- 
pirical values for the critical band, as 
measured in experiments on loud- 
ness, two-tone masking, etc. This 
discrepancy, however, may be re- 
solved either by a modification of 
Fletcher’s second hypothesis, or, bet- 
ter, by direct measurements of the 
masking band. Let us turn first to 
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the indirect measurements of the 
masking band and the assumptions 
underlying them. 


Indirect 
Band 


If both Fletcher’s hypotheses 
about the existence of a masking 
band and about the equality of the 
intensities of the tone and noise are 
accepted, it is possible to calculate 
the size of the masking band from the 
masked thresholds for pure tones in 
white noise. Only one empirical 
operation is necessary. The thresh- 
old for a tone is measured in the 
presence of a white noise. From the 
intensity of the just-masked tone 
and the intensity of the masking 
noise, it is fairly simple to calculate 
how large a band within the noise 
contains the same energy as the tone. 
The width of this band is, by defini- 
tion, the masking band. Its width is 
calculated by taking the ratio of the 
intensity of the tone to the intensity 
per cycle of the noise. (Since a white 
noise contains all audible frequencies 
at equal intensity, the intensity per 
cycle is uniform throughout.) For 
example, Hawkins and Stevens (1950) 
found that the ratio between the in- 
tensity of a 1000-cycle tone (at its 
masked threshold) and the intensity 
per cycle of the masking noise is 
63:1 or 18 db. Since the intensity in 
each one-cycle band of noise is 1/63 
the intensity of the masked tone, a 
band of frequencies 63 cps wide will 
have an overall intensity equal to 
that of the tone. Therefore, accord- 
ing to the second hypothesis, the 
masking band is taken to be 63 cps 
wide for a tone of 1000 cps. Values 
for the masking band that are calcu- 
lated in the foregoing manner wili be 
called ‘‘critical ratios,’ as suggested 
by S. S. Stevens (see Zwicker et al., 
1957). 


Measures of the Masking 
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Hawkins and Stevens measured 
the masked thresholds at many fre- 
quencies from 100 to 9000 cps in the 
presence of white noise at levels from 
20 to 90 db. They found that the 
ratio of the intensity of a just-masked 
tone to the intensity per cycle of the 
masking noise remains constant at 
all noise levels except the very lowest. 
In other words, the critical ratio does 
not change as a function of the level 
of the masking noise. The critical 
ratio is, however, different at dif- 
ferent center frequencies, as shown in 
Figure 1. The results of these experi- 
ments agree with similar measure- 
ments that Fletcher and Munson 
(1937) had made of the critical ratio 
for tones masked by a uniform mask- 
ing noise. 

Bilger and Hirsh (1956) also calcu- 
lated critical ratios from masking 
data obtained with bands of white 
noise 250 mels wide. (The mel is a 
unit of pitch.) The substitution of a 
250-mel band, which is about five 
times as wide as the critical ratios 
measured by Hawkins and Stevens, 
is consistent with the assumption 
that the energy outside the masking 
band contributes nothing to the 
masking effect. If this, Fletcher’s 
fundamental assumption, is true the 
critical ratio should be the same in 
both experiments. The results of the 
two independent experiments were, 
in fact, in close agreement. 

In all these experiments the calcu- 
lated value of the critical ratio de- 
pends upon the measured value of 
the masked threshold which may not 
be very reliable. Blackwell (1953) 
has shown, for example, that the 
value obtained for a threshold de- 
pends upon the psychophysical meth- 
od employed in its measurement. 
The congruence of the results of the 
several experiments tends, however, 
to negate this criticism. Using the 
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reported threshold measurements, we 
can modify Fletcher’s second asump- 
tion so that the masking band has 
the same values as the critical band. 

Instead of assuming, quite arbi- 
trarily, that the intensities of the 
masked tone and of the masking 
band are equal, we can just as well 
assume that the intensity of the 
masking band is two-and-one-half 
times as great as that of the masked 
tone. Over most of the frequency 
range, this simple modification of 
Fletcher’s second hypothesis yields 
values for the masking band that are 
equal to the measured values of the 
critical band. A simple modification 
succeeds because, as Figure 1 shows, 
except for very low frequencies, the 
critical band and the critical ratio 
are the same functions of center fre- 
quency. Since this new assumption is 
ad hoc and arbitrary, it will probably 
have little appeal. What we need is a 
more direct and _ straightforward 
type of evidence of the existence of 
the masking band. 


Direct Measures of the Masking Band 


The direct measurement of the 
masking band requires the sampling 
of the masked threshold for tones in 
the presence of bands of noise of dif- 
ferent widths. If a masking band 
exists, the tone should become more 
difficult to detect as the bandwidth 
of the noise is increased up to the 
value of the masking band. Increasing 
the bandwith beyond the masking 
band should not raise the threshold 
for the tone any further. (In such 
experiments, energy is added to the 
noise as the bandwidth is increased, 
unlike experiments on loudness sum- 
mation where a constant amount of 
noise energy is spread over a wider 
frequency range in order to increase 
the bandwidth.) Direct measure- 
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ments of this type have been reported 
by Fletcher (1940), Hamilton (1957), 
and Schafer, Gales, Shewmaker, and 
Thompson (1950). Some of the re- 
cent experiments suggest that the 
masking band is larger than the crit- 
ical ratio and may approximate the 
critical band as measured for other 
auditory phenomena. 

In the first and most famous of 
these experiments, Fletcher (1940) 
measured the threshold for tones of 
seven different frequencies ranging 
from 125 to 8000 cps in the presence 
of bands of noise of various widths. 
No information about subjects, ap- 
paratus, or procedure was given. The 
results of this admittedly preliminary 
experiment provided some evidence 
for the masking-band hypothesis; 
the masked threshold tended first to 
increase and then to remain constant 
as the bandwidth of the masking 
noise was increased. The results 
seemed also to justify the assump- 
tion that, within the masking band, 


the intensity of the noise and the just- 
masked tone are equal: a band of 
noise, 30 cps wide, just masked a 
tone lying at its center frequency and 


having the same intensity. Precise 
determinations of the width of the 
masking band were not possible, 
however, because the data were 
highly variable and only a few band- 
widths had been sampled. Of band- 
widths having values in the vicinity 
of those for the masking band, only 
one, 200 cps wide, was adequately 
sampled. Nevertheless, relying heav- 
ily upon the assumption that the 
masking band and the just-masked 
tone are equally intense and upon 
the threshold measurements made in 
the presence of wide-band noise, 
Fletcher suggested values for the 
width of the masking band. These 
values, which Fletcher cautioned 
might be wrong by a factor of two, 
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turned out to be approximately the 
same as the critical ratios calculated 
in 1950 by Hawkins and Stevens (see 
Figure 1). This similarity is not sur- 
prising, for the values recommended 
by Fletcher were, in effect, critical 
ratios. While suggestive, Fletcher’s 
results provided neither conclusive 
support for his hypotheses nor a solid 
basis for the direct measurement of 
the width of the masking band. 

Hamilton’s (1957) more recent 
work provides a direct and precise 
measure of the masking band. Meas- 
uring the masked threshold for an 
800-cycle tone in the presence of 
bands of noise that were centered at 
800 cps and that varied in width 
from 19 to 1100 cps, he found that up 
to a bandwidth of 145 cps the masked 
threshold increased as the width of 
the masking noise increased. Beyond 
145 cps the threshold remained con- 
stant, indicating that the masking 
band at 800 cps is 145 cps wide. The 
critical band measured in four other 
types of experiment is also about 145 
cycles wide at 800 cps (see Figure 1). 
This coincidence of values is remark- 
able in view of the variability in- 
herent in these experiments and 
Hamilton’s apparent unfamiliarity 
with the other measures of the crit- 
ical band. 

A second important result in Ham- 
ilton’s experiment shows that the 
difference (the signal/noise ratio) be- 
tween the intensity of the 800-cycle 
tone at its masked threshold and the 
overall:intensity of the masking noise 
is not constant, even when the width 
of the masking noise is less than a 
critical band. The signal/noise ratio 
decreases from about 0 db. for a band 
30 cps wide to almost —4 db. for the 
critical width of 145 cps. (Hamilton 
reports similar results by Bauman, 
Dieter, Lieberman, and _ Finney, 
1953.) Fletcher had also found that 
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a band 30 cps wide just masks a tone 
at its center when the signal/noise 
ratio is 0 db., i.e., when the intensi- 
ties of the tone and the noise are 
equal. This equality at a width of 30 
cps suggested that at the critical band- 
width also, the tone and noise have 
the same intensity. Hamilton showed, 
however, that at the critical band- 
width the signal/noise ratio is not 
the same as at 30 cps. Accordingly, 
Fletcher’s threshold measurements 
for a tone in a 30-cps-wide band of 
noise probably lend no support to the 
critical-ratio hypothesis; they are, 
however, consistent with critical- 
band values for the masking band. 
Although Hamilton studied only 
one frequency, his results provide 
valuable information because they 
are orderly and self-consistent. Prob- 
ably the use of a forced-choice proce- 
dure with well-trained subjects con- 
tributed to the preciseness of the re- 
sults. In contrast, Schafer et al. 


(1950) report a more extensive ex- 


periment whose results are difficult to 
interpret. They measured the masked 
threshold for tones in three frequency 
regions as a function of the band- 
width of the surrounding noise. In- 
stead of the usual white noise, they 
used bands of synthetic noise com- 
posed of tones one cycle apart. Pre- 
liminary experiments indicated no 
important difference between these 
bands of synthetic noise and bands 
of white noise. Twenty-five subjects 
served in the main experiments in 
which a random method of limits 
was used to measure the masked 
threshold for a tone that had been 
matched in pitch to the masking 
noise. The results suggest the pres- 
ence of a masking band, but since no 
sharp change in the masked thresh- 
old was observed as;the bandwidth 
was increased, the width of the mask- 
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ing band can be estimated only ap- 
proximately. In the three frequency 
regions that were tested, the results 
suggest a masking band that is larger 
than that given by the critical ratio, 
and one that could well be as large as 
a critical band. 

Schafer et al. (1950) interpreted 
their results to indicate no change in 
the signal/noise ratio within the 
masking band. Hamilton (1957), on 
the other hand, did find a small but 
consistent change in the signal/noise 
ratio within the masking band. Since, 
however, Schafer’s observers were 
too variable to permit a precise meas- 
urement of changes in the signal/ 
noise ratio, the small difference be- 
tween the results of the two experi- 
ments is probably not significant. 
There is also some question about 
what Schafer et al. measured. Their 
use of a tone ‘“‘matched in pitch to the 
masking noise’ may account for some 
of the disparity between their results 
and Hamilton’s. 

These two experiments, by Hamil- 
ton and by Schafer, seem to be the 
only direct tests of the masking-band 
hypothesis since Fletcher’s original 
attempt. One related experiment 
(Webster, Miller, Thompson, & Dav- 
enport, 1952) deserves mention. A 
white noise with octave gaps was 
used to mask tones at frequencies 
corresponding to those in and near 
the gaps. The measurements of the 
masked thresholds seem to suggest 
that Fletcher’s values for the mask- 
ing bands are too small. 

The lack of extensive tests of the 
masking-band hypothesis prevents a 
definitive statement about the valid- 
ity of the hypothesis, and even less 
may be said about the size of the 
bands. Nevertheless the net impres- 
sion one obtains from the literature is 
that a masking band does exist and 
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that it may well be the same width 
as the critical band.® 


OTHER CORRELATES OF THE 
CRITICAL BAND 


We have seen that the function 
relating the critical band to the fre- 
quency at the center of the band is 
derived from four types of experi- 
ment and that the width of the mask- 
ing band may be the same as that of 
the critical band. Of interest, also, 
is the resemblance that the critical- 
band function bears to several other 
functions of frequency: the place of 
maximal displacement on the basilar 
membrane, the difference limen for 
frequency, and the mel scale of sub- 
jective pitch. These similarities have 
been noted elsewhere with respect to 
the critical band (Zwicker et al., 
1957) and also with respect to the 
critical ratio (Fletcher, 1940, 1953; 
von Békésy & Rosenblith, 1951). 

Perhaps the most interesting fact 
about the critical band is that it 
seems to correspond to a constant 
distance of about 1.3 millimeters 
along the basilar membrane. The 
first line in Figure 4 is a slightly 
idealized schematization of the fre- 
quency representation on the basilar 
membrane. The second line shows 
that 24 or 25 critical bands may be 
represented by equal-sized segments 


5Since the preparation of this article, 
Greenwood (1960) has reported an extensive 
study that confirms the suggestion that there 
is a masking band and that it is the same size 
as the critical band. Greenwood measured the 
threshold for pure tones presented in bands of 
white noise. He varied not only the width of 
the bands of noise around a given center fre- 
quency, but also the sensation level of the 
noise and the frequency of the masked tone. 
Investigating bands of noise in five regions of 
the spectrum, he found consistent evidence for 
the existence of a fairly sharp masking band 
approximately the same size as the critical 


band. 
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Fic. 4. Representation on the basilar mem- 
brane of (1) frequency in kilocycles, (2) criti- 
cal bands, (3) pitch (Stevens & Volkmann, 
1940), (4) just noticeable differences for fre- 
quency, the fifth line marks off distance in 
millimeters on the basilar membrane. (This 
figure is adapted from the book, Das Ohr als 
Nachrichtenempfinger, by Feldtkeller and 
Zwicker—1956, p. 60.) (Adapted with per- 
mission of S. Hirzel Verlag) 


of the membrane. The boundaries of 
the critical bands are not fixed, of 
course, since a critical band may take 
shape around any frequency. 

The mel and the jnd for frequency 
also correspond to constant distances 
on the basilar membrane (see the 
third and fourth lines in Figure 4). 
It is, therefore, not surprising that 
the critical-band function looks very 
much like the functions for the mel 
scale and the jnd scale. Measured in 
mels, the size of the critical band 
varies little, from 100 mels at low 
center frequencies to 180 mels at 
high frequencies. The mel scale is 
not accurate enough, however, to 
distinguish 100 from 180 mels at op- 
posite ends of the scale, so that the 
pitch range of the critical band may, 
in fact, be fairly constant, perhaps 
approximating 150 mels. 

The width of the critical band on 
the basilar membrane is determined 
from the map relating the frequency 
of pure tones to the position of maxi- 
mal stimulation on the membrane 
(von Békésy, 1949). Although no di- 
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rect physiological measures of the 
critical band have been reported, the 
fact that throughout the frequency 
spectrum the critical band corre- 
sponds to a constant length of the 
basilar membrane lends support to 
the notion that this band may be re- 
garded as a fundamental unit of hear- 
ing. 


FUTURE PROSPECTS 


With the experimental basis for 
the critical band reasonably well es- 
tablished, investigators are beginning 
to consider the relevance of the crit- 
ical band to the loudness of pure 
tones, to temporal integration, to 
deafness, to speech perception, and 
to other auditory processes. 

Zwicker (1956, 1958), for example, 
has argued that the loudness of an in- 
tense pure tone is a composite loud- 
ness because the displacement of the 
basilar membrane is spread over 
many critical bands. Zwicker as- 
sumes that the “loudnesses’”’ corre- 
sponding to these critical bands sum- 
mate to give the total loudness of the 
tone. Similar assumptions underlie 
Zwicker’s (1958) system for the ob- 
jective calculation of the loudness of 
a complex noise. The loudness of a 
noise is assumed to equal the sum of 
the individual loudnesses of the com- 
ponent critical bands after allowance 
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for mutual masking effects among the 
bands. 

Other investigators are studying 
temporal integration for short tone 
pulses (cf. Ploomp & Bouman, 1959). 
Since short tone pulses are in effect 
multicomponent complexes whose 
bandwidth varies with time, the in- 
tegration of energy at threshold 
would be expected to occur within 
the critical band. 

Clinical use of the critical band has 
been attempted by deBoer (1960) in 
the diagnosis of hearing loss. His re- 
sults suggest that the critical-band 
mechanism may be disturbed in cer- 
tain kinds of deafness. The related 
problem of individual differences for 
the critical band has remained essen- 
tially uninvestigated except for some 
observations by Niese (1960) and 
indications from earlier data (e.g., 
Gissler, 1954) that the size of the 
critical band may vary from person 
to person, just as thresholds do. 

Although 


no answers have yet 
come forth, phoneticists are begin- 
ning to ask about the role of the crit- 
ical band in the perception of speech. 


Musicians may soon add _ their 
problems. The quest has begun in 
earnest. Now that a fundamental 
unit of hearing has been identified, 
it remains to discover its role in all 


the many processes called hearing. 
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PERSEVERATIVE NEURAL PROCESSES AND CONSOLIDA- 
TION OF THE MEMORY TRACE! 


STEPHEN E. GLICKMAN 
Northwestern University 


For a short period between the 
turn of the century and the first 
world war, theories of perseveration 
figured prominently in attempts to 
understand many of the newly dis- 
covered phenomena of learning and 
forgetting. Although the exact lines 
of speculation varied from one writer 
to the next, in general, a neural 
fixation process was assumed to con- 
tinue after the organism was no 
longer confronted with the stimuli to 
be learned. This fixation process was 
deemed crucial to efficient retention 
and interference with perseveration 
was presumed to have an adverse ef- 
fect on an organism’s ability to re- 
member stimuli to which it had been 
exposed. 

The first clear statement of such a 
consolidation theory is generally at- 
tributed to Miiller and Pilzecker 
(1900). In order to account for the 
existence of retroactive inhibition, 
Miiller and Pilzecker postulated the 
existence of a neural perseverative 
process, subject to external inter- 
ference and requisite to the consolida- 
tion of the memory trace for recently 
acquired material. Although knowl- 
edge of the physiology of brain func- 
tion was still quite limited, Miiller 


1 The preparation of this paper was made 
possible by a postdoctoral fellowhip held by 
the author at the Department of Physiology 
and Biophysics, University of Washington 
Medical School, during the summer of 1959, 
and supported by Grant 2-B5082 from the 
National Institute of Neurological Diseases 
and Blindness. The author is indebted to 
C. P. Duncan, S. M. Feldman, T. Kennedy, 
and D. Kimura for critically reading the 
manuscript. 
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and Pilzecker nevertheless attempted 
to be as precise as possible regarding 
the neural locus of perseveration. 
They rejected the notion that per- 
severation was in any way analogous 
to sense organ processes such as those 
believed to underlie the negative 
afterimage, on the grounds that these 
sensory processes were of too short 
duration. On the other hand, the per- 
severation which Miiller and Pilz- 
ecker observed did appear to be sim- 
ilar to the repetitious or stereotyped 
behavior resulting from diseases of 
the subcortical motor centers. It was 
with these latter structures that 
Miiller and Pilzecker associated per- 
severative activity. 

Numerous other psychologists were 
concerned with perseveration theory 
during the early 1900s. Among 
these, DeCamp (1915) advanced 
what was probably the most detailed 
piece of pseudoneurological specula- 
tion: 

From the neurological standpoint, in the 
learning of a series of syllables, we may as- 
sume that a certain group of synapses, nerve 
cells, nerve paths, centres, etc., are involved. 
Immediately after the learning process the 
after-discharge continues for a short time, 
tending to set associations between just 
learned syllables. Any mental activity en- 
gaged in during this after-discharge, involv- 
ing or partially involving the same neuro- 
logical group, tends, more or less, to block the 


after-discharge, and give rise to retroactive 
inhibition (p. 68). 


Some years previous, Sherrington 
(1906) had described the phenom- 
enon of afterdischarge in spinal re- 
flexes and discussed the blockage of 
such discharges by subsequent stim- 
uli. It is interesting to note that 
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this provided the theoretical model 
for DeCamp’s view of perseverative 
processes in much the same manner 
as Sherringtonian physiology gen- 
erally shaped the psychologists’ con- 
ception of neural activity (see Hebb, 
1951). 

As a behavioral theory of retroac- 
tive inhibition, however, persevera- 
tion theory met with many difficul- 
ties, and was eventually replaced by 
the current concepts of associative 
interference (McGeoch & Irion, 1952; 
Osgood, 1953), although it continued 
to receive some limited support as 
a possible factor in forgetting (Wood- 
worth, 1938). Ultimately, a per- 
severation theory, erected on the 
basis of inferences from behavior, was 
no longer viable once the behavioral 
observations were either shown to be 
false or explained more parsimoni- 
ously by other hypotheses. The re- 
juvenation of this theory awaited di- 
rect support from neurology. 

Lashley (1918) once made the fol- 
lowing comment on perseveration 
theory: 


If there is a gradual strengthening of associa- 
tions during periods of nonpractice, there is 
implied a continuation of chemical changes 
within the nerve cells, initiated by the passage 
of a neural impulse through new channels and 
persisting for hours or even days without the 
influence of continued impulses. The experi- 
mental evidence upon which the belief in a 
gradual fixation of associations is based is far 
from convincing ... it all can be explained 
equally well by other hypotheses and, in view 
of the extreme importance of the point for 
physiological explanation, we should be care- 
ful not to accept the assumption of a gradual 
setting of new functional connections until 
some real evidence is advanced to support it 
(pp. 363-364). 


This healthy skepticism was cer- 
tainly justified, although even at the 
time some physiological evidence was 
available to buttress perseveration 
theory. 
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Shortly after the publication of 
Miiller and Pilzecker’s work, Mc- 
Dougall (1901) called attention to 
the applicability of their persevera- 
tion theory to the explanation of 
retrograde amnesia (RA) resulting 
from cerebral trauma. However, 
Burnham (1904) was apparently the 
first individual to extensively discuss 
the relationship between RA and per- 
severative ‘‘consolidation” amnesia. 
Burnham's paper involved an anal- 
ysis of two cases of retrograde am- 
nesia. Both of these subjects had sus- 
tained head injuries as the result of 
accidents and in both cases there was 
a loss of memory for events occurring 
during the period preceding the ac- 
cident. As the result of his studies of 
these cases and of others cited by 
Ribot (1892), Burnham suggested 
that 


The fixing of an impression depends upon a 
physiological process. It takes time for an 
impression to become so fixed that it can 
be reproduced after a long interval; for it 
to become part of the permanent store of 
memory considerable time may be neces- 
sary. This we may suppose is not merely a 
process of making a permanent impression up- 
on the nerve cells, but also a process of associ- 
ation, of organization of the new impressions 
with the old ones (p. 392). 


He further speculated that: (a) the 
time required for this fixation process 
may vary with individuals and con- 
ditions; (6) shock produces its effects 
by arresting the fixation process in 
the nervous tissue; (c) such shock 
may be produced by great fatigue, 
excitement, unconsciousness, or nar- 
cotics; (d) RA is not all-or-none and 
the extent of the amnesia is relative 
to the amount of time elapsing before 
the fixation process is interrupted; 
and finally (e) that automatic ac- 
tivity is an important factor in fixing 
impressions although it may not 
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necessarily be directly observable in 
terms of movements. 

These remarkable observations 
would appear’ to have been borne out 
by recent experiments in nearly every 
case and we can now advance these 
propositions with much more con- 
fidence. 

During the first 4 decades of this 
century, the phenomenon of RA con- 
stituted the only direct physiological 
evidence for the existence of a neural 
fixation process. Early references to 
it are to be found in Ballard (1913), 
Pillsbury (1913), DeCamp (1915), 
and others. Although a complete re- 
view of this literature is beyond the 
scope of the present paper, it is per- 
haps worthwhile to examine the re- 
sults of a comprehensive study by 
Russell and Nathan (1946). In a sur- 
vey of 1,029 cases of head injury, only 
133 were found to have experienced 
no RA whatsoever. Seven hundred 
and seven reported amnesia for 


events occurring from several sec- 
onds to 30 minutes preceding the 


injury, while 133 reported RA of 
more than 30-minutes duration. Rec- 
ords were unavailable with 56 pa- 
tients in the sample. Russell and 
Nathan noted that the duration of 
RA is “in most cases a few moments 
only.”” Since the use of barbiturate 
hypnosis reduced the period of RA in 
only 6 of 40 cases, and produced no 
data suggestive of hysterical repres- 
sions, the authors conclude that loss 
of the material is due to a blocked 
perseveration process: 

It seems that the mere existence of the brain 
as a functioning organ must strengthen the 
roots of distant memories. The normal ac- 
tivity of the brain must steadily strengthen 
distant memories so that with the passage of 


time these become less vulnerable to the 
effects of head injury (p. 299).? 





* Coons and Miller (1960) have recently 
called attention to the possibility of sampling 
artifacts confounding the consolidation inter- 
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Experimentally induced RA has 
produced the best evidence for the 
existence of a consolidation process 
since the results would be predictable 
from perseveration theory, while the 
primary competing theory, the asso- 
ciative interference theory, has no 
explanation to offer. We will therefore 
turn now to a review of the various 
experimental procedures used to in- 
duce RA and the results obtained. 


Electroconvulsive Shock 


The introduction of electroshock 
therapy in 1937 provided both the 
impetus and the technical apparatus 
for the laboratory study of RA. Im- 
mediately after its introduction many 
practitioners observed that electro- 
convulsive shock (ECS) produced a 
temporary postshock amnesia which 
eventually shortened toa genuine RA 
for events immediately preceding the 
shock treatment. Zubin and Barrera 
(1941) were the first investigators to 
subject these observations to sys- 
tematic study. They trained 10 pa- 
tients in a series of paired associate 
lists to a criterion of two consecutive 
correct repetitions. Learning occurred 
either in the morning or evening, 
while the retention tests were given 
during the subsequent afternoon. 
The same subjects were used in con- 
trol and experimental conditions, i.e., 
(a) with no shock intervening be- 
tween learning and the retention test, 
and (6) with an ECS interpolated 
after the morning learning session. 
With no intervening shock there 
were significant savings between 





pretation of clinical observations of retro- 
grade amnesia. Thus, they have pointed out 
that, if an injury produces a general decre- 
ment in memory, positive evidence for 
memory is more likely to be secured while 
examining the larger time samples involved 
in remote memories as compared to recent 
memories. 
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learning and relearning, with an in- 
terpolated ECS there were no signif- 
icant savings. A comparison between 
the effects of ECS on material learned 
the evening prior to shock with ma- 
terial learned the morning preceding 
shock indicated that recent material 
was more severely affected by ECS 
than remote material. The latter 
conclusion was based on rather small 
differences in savings scores and in- 
sufficient data are presented to per- 
mit adequate statistical evaluation. 
However, Flescher (1941), Williams 
(1950), and Cronholm and Molander 
(1958) have subsequently confirmed 
the substance of Zubin and Bar- 
rera’s assertions. The various in- 
vestigators using human subjects, al- 
though successfully employing ECS 
to interfere with memory, had not at- 
tempted to adequately define the 
time relations of such interference. 

This critically important step was 
taken by Duncan (1949). Duncan's 


procedure involved training rats to 
avoid shock to the feet in a shuttle- 
box situation. A light, turned on 10 
seconds prior to grid shock, served as 


the conditioned stimulus (CS). The 
animals received one trial per day for 
18 days and records were kept of the 
number of successful avoidance re- 
sponses. Nine groups of animals were 
used in the study. Rats in eight of 
these groups received an ECS after 
each day’s trial, the trial-ECS in- 
terval ranging from 20 seconds to 14 
hours. In the remaining group, the 
ear clips used for delivering the ECS 
were applied following each day’s 
trial but no current was passed. The 
results clearly indicated a deleterious 
effect of ECS on performance, the 
magnitude of the effect decreasing as 
the trial-ECS interval increased to 
produce a negatively accelerated 
curve. This general finding has since 
been confirmed by Ransmeier (1953), 
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Thompson and Dean (1955), and 
Leukel (1957). All of the findings are 
compatible with the view that a 
single ECS can produce deficits in re- 
tention if delivered within 15 to 60 
minutes following a learning trial. 
Moreover, ECS induced immediately 
following the learning trial effectively 
obliterates nearly all retention of the 
“learned”’ response. The studies fol- 
lowing Duncan’s have employed dif- 
ferent learning tasks. Leukel (1957) 
and Ransmeier (1953) used maze 
learning situations with the ECSs 
being delivered at varying posttrial 
intervals. Thompson and his collab- 
orators have employed a visual dis- 
crimination learning task, with avoid- 
ance of grid shock as the motivating 
agent. In these latter studies 
(Thompson, 1957a; Thompson & 
Dean, 1955; Thompson & Penning- 
ton, 1957) a single ECS was ad- 
ministered at various intervals fol- 
lowing a series of massed trials in the 
apparatus. As a result of these ex- 
tensive experiments, it has been de- 
termined that ECS produces greater 
deficits in young than adult rats 
(Thompson, 1958a; Thompson, Har- 
avey, Pennington, Smith, Gannon, & 
Stockwell, 1958). Further, rats suf- 
fering from anoxia induced brain 
damage (Pennington, 1958), show 
greater deficits resulting from a single 
ECS than intact control animals. 
Both the findings with respect to age 
and those relating to brain damage 
are compatible with Thompson and 
his co-workers’ (1958) hypothesis 
that the extent of the deficit will be 
proportional to the number of cor- 
tical neurons available. Pennington 
(1958) has alternately suggested that 
the results obtained with brain dam- 
aged rats may be a function of a pro- 
longed perseveration process in these 
animals. 

Thompson and Pennington (1957) 
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have also found that the memory dec- 
rement produced by a single ECS 
was less after spaced trials than after 
massed trials. This result was ex- 
pected from the point of view of a 
perseveration theory as a joint func- 
tion of “firmer fixation of the mem- 
ory trace owing to a longer duration 
of perseveration” and “the lessened 
intensity of perseveration at the end 
of training due to dissipation of per- 
severative activity.” 

Although the empirical result of 
interference with performance by 
postlearning ECS has not been ques- 
tioned, the interpretation of the re- 
sults is not quite as clear. The points 
to be discussed below actually raise 
questions of interpretation which 
apply not only to the ECS proce- 
dures but to other interpolated phys- 
iological procedures as well. 

1. The most serious alternative to 
a consolidation interpretation of the 
ECS results has been offered by 
Miller and Coons (1955). These in- 
vestigators trained rats to eat in a 
runway and then shocked them while 
eating there. Avoidance was meas- 
ured by an increased latency of ap- 
proach to the eating place. ECSs 
were delivered to the animals at 
varying intervals after shock to the 
mouth. Miller and Coons reasoned 
that any aversive qualities of the 
ECSs might be expected to produce 
increased avoidance. On the other 
hand, if the ECS really interrupted 
consolidation, the subjects would 
show the opposite behavior, namely, 
approdching the food without hesita- 
tion. In this experiment no evidence 
was found for an attenuation of the 
avoidance response by the ECS, 
leading the authors to argue that the 
retardation in learning observed by 
Duncan (1949) was simply a function 
of placing the rat in a conflict situa- 
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tion. In a more recent set of experi- 
ments, Coons and Miller (1960) have 
succeeded in opposing the conflict 
and consolidation interpretations ina 
double grid-box situation similar to 
that used by Duncan. Here again, 
their results indicate that ECS may 
not eliminate memory but merely in- 
duce anxiety or conflict which in- 
hibits performance of the response in 
question. They further buttress their 
contentions regarding the fear in- 
ducing qualities of ECS with obser- 
vations on increased defecation, uri- 
nation, and weight loss in those ani- 
mals for whom the performance of an 
otherwise rewarded response is fol- 
lowed by an ECS. In both of their 
studies, the ECS apparently sum- 
mated with the grid shock to produce 
a result which significantly favored a 
conflict as opposed to a perseveration 
interpretation. Observations of Gal- 
linek (1956) suggest that analogous 
anxiety builds up in human beings 
during the course of electroshock 
therapy. Such an interpretation is 
logically possible for both the avoid- 
ance situations used by Duncan and 
by Thompson and Dean, and the 
maze learning situations used by 
Leukel and by Ransmeier and Ger- 
ard. The standard control for this 
has been to employ groups receiving 
painful but nonconvulsive shocks. In 
these cases (Duncan, 1949; Leukel, 
1957; Ransmeier & Gerard, 1954) it 
has been found that (a) the decre- 
ments produced by the painful but 
nonconvulsive shocks are not nearly 
as severe as those produced by ECS 
at comparable intervals, and (0) the 
posttrial interval during which pain- 
ful shocks produced their effect was 
always much shorter than that dur- 
ing which significant decrements 
could be produced by ECS. These 
latter control results would seem to 
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indicate that the ECS results are due 
to more than just conflict. It might 
be argued, however, that the ECSs 
are sufficiently more painful or un- 
pleasant than the leg or tail shocks to 
account for the greater deficits pro- 
duced by the former. Reference to 
human subjects would suggest that 
this is not the case. Patients do not 
necessarily report pain as an ac- 
companiment of a properly delivered 
ECS (Stainbrook, 1948). In, view of 
this, and considering that there is no 
consistent experimental evidence for 
punishment obliterating verbal ma- 
terial (Rapaport, 1942), it seems un- 
likely that the deficits observed in 
humans following ECS (or any cere- 
bral trauma) can be explained purely 
in conflict terms. Finally, in regard 
to the animal literature, it seems 
reasonable to point out that Miller 
and Coons delivered a series of ECSs 
on successive days, whereas Thomp- 
son and his collaborators eliminated 
a persistently rewarded response with 
a single ECS. 

In order to explain Thompson's re- 
sults in conflict terms, one would 
have to assume a delay of reinforce- 
ment gradient lasting at least 60 min- 
utes, and the build up of a significant 
amount of fear following a single 
ECS. Such assumptions, although 
possible, would not be easy to sup- 
port at the present time. Clearly, 
however, other workers should carry 
out experiments utilizing designs 
similar to those employed by Miller 
and Coons, i.e., opposing the con- 
solidationand conflict interpretations. 
The writer has used such a procedure 
in an experiment involving direct 
stimulation of the brain (Glickman, 
1958) and this could easily be 
adapted for ECS. Moreover, the one- 
trial learning situation employed in 
this latter experiment would permit 
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the use of a single ECS and enable 
an exceedingly accurate estimate 
of the trial-ECS interval.* 

2. An alternate interpretation of 
the ECS results is also possible in 
those studies employing food reward. 
As Kohn (1951) and Berkun, Kessen, 
and Miller (1952) have shown, the 
rewarding properties of food are de- 
rived in part from stimulation of re- 
ceptors within the mouth, and in 
part from actions within the stom- 
ach. ECS delivered shortly after a 
learning trial might act to prevent 
the perception of the feedback from 
the stomach and thereby cut down on 
the reinforcing properties of the food. 
In view of the relatively minor con- 
tribution of these stomach receptors, 
particularly in the early stages of 
learning, such effects are probably 
insignificant in the studies of Rans- 
meier (1953) and Leukel (1957). 

3. A question has arisen about dis- 
tinguishing between ECS effects on 
a time-limited consolidation process 
and the more generalized memory 
deficits which have been observed to 
follow a series of ECSs (see Stain- 
brook, 1946, for review). In particu- 
lar, Worchel and Gentry (1950) have 
suggested that Duncan’s (1949) find- 
ing of a limited period following 
learning when an ECS will be effec- 
tive is a result of his failure to used 
massed ECSs. On the basis of some 
T maze data of their own, Worchel 
and Gentry argue that Duncan might 


* Since this article went to press, there have 
been two reports of experiments in which the 
conflict and consolidation interpretations of 
“forgetting’’ have been opposed in one-trial 
learning situations. In both of these cases, in 
which the introduction of various chemical 
agents served as the interpolated procedure, 
the results favored a consolidation interpre- 
tation of the effects (Essman & Jarvik, 
1960; Pearlman, Sharpless, & Jarvik, 1961). 
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have considerably extended the dura- 
tion of time during which ECS would 
produce a deficit if if he had given a 
series of ECSs. Worchel and Gen- 
try’s results do not contradict the 
general finding that it is easier to 
disrupt learning in the period im- 
mediately following exposure to the 
learning situation.. However, at the 
present time, the ECS data are com- 
patible with the notion that the 
strengthening of memory traces is a 
continuous one throughout the life of 
the organism. For example, Brady 
(1951, 1952) has found evidence of 
spontaneous growth in the strength 
of a conditioned emotional response 
during a period of 90 days. On the 
basis of current evidence, one might 
expect that the interval following a 
learning trial, during which time in- 
terference with retention can be 


produced, is a direct function of the 

degree of physiological severity of 

the interpolated procedure. 
‘Ultimately, there is probably some 


practical limit on the time interval 
between learning and ECS during 
which selective effects on retention 
can be produced. In addition, the 
effects of a series of ECSs delivered 
many hours or days after learning 
are often apparently temporary 
(Stainbrook, 1946). Brady (1951) re- 
ported that a series of ECSs sup- 
pressed a conditioned emotional re- 
sponse (CER) for a period of a 
month, although the habit reap- 
peared spontaneously at the end of 
that time. It has also been found 
that the effects of a series of ECSs 
may be selective for emotional re- 
sponses (Geller, Sidman, & Brady, 
1955). On the basis of the accumu- 
lated data, it seems reasonable to 
suggest that ECS may affect per- 
formance in a number of ways in- 
cluding: (@) a temporary suppressor 
action involving those cerebral struc- 
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tures mediating pain or anxiety re- 
sponses (such a mode of action would 
explain the proactive effects noted 
by Poschel, 1957, and Carson, 1957, 
on avoidance conditioning) and (0) 
a direct action on the neural circuits 
involved in memory which, if the 
learning-ECS interval is brief enough 
and the treatment sufficiently severe, 
may permanently erase the effects of 
such learning. 


Anoxia 


Hayes (1953) demonstrated equiv- 
alent retroactive effects of anoxia 
and ECS on maze learning in rats. 
He used a distributed practice proce- 
dure and administered the experi- 
mental treatment one hour after each 
trial. The experimental rats showed 
similar retardation in learning when 
their acquisition curves were com- 
pared with normal control animals. 
Hayes reports that histological ex- 
amination of the brains produced no 
clear evidence of brain damage for 
any of the animals. Ransmeier and 
Gerard (1954) have also reported 
disturbances in maze learning result- 
ing from anoxia, the magnitude of 
the disturbance decreasing “along 
characteristic curves with increasing 
intervals between training and ex- 
perimental procedures.” 

Using a discrimination learning 
procedure, Thompson and Pryer 
(1956) showed that anoxia, produced 
by placing rats in a decompression 
chamber during the postleai ning pe- 
riod, could lead to decrements in 
retention analogous to those pro- 
duced by ECS. In a later study, 
Thompson (1957a) found that a 10- 
minute exposure to a_ simulated 
30,000-foot altitude produced deficits 
equivalent to those resulting from 
ECS, although exposure to a 20,000- 
foot altitude did not produce such 
severe effects. Finally, Thompson 
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(1957a) has also reported that when 
an ECS was given 30 seconds post- 
training, a subsequent 10-minute ex- 
posure to a simulated 30,000-foot 
altitude did not produce an addi- 
tional deficit. 


Tem perature 


A number of investigators have 
studied the effects of postlearning 
temperature on retention. In most 
of the earlier work (French, 1942; 
Hunter, 1932; Jones, 1943) the aim 
was to reduce the activity of the ex- 
perimental group and thereby reduce 
retroactive inhibition. Considered in 
the light of the ECS literature, these 
studies are not immediately relevant 
to the present review because of the 
prolonged interval between the learn- 
ing trials and the achievement of the 
desired temperature change. 

In the most recent studies of Cerf 
and Otis (1957) and Ransmeier and 
Gerard (1954) it appears that tem- 
perature may have some effect on 
processes related to consolidation. 
The former investigators gave gold- 
fish 10 massed trials in an avoidance 
situation using a shifting light as the 
CS. At varying intervals following 
the trials, (0 minute, 15 minutes, 60 
minutes, or 4 hours) the body tem- 
peratures of different groups of 15 
to 19 subjects were raised briefly to a 
point sufficient to induce heat narco- 
sis (36.5°-37.0° C). In retention tests 
carried out the next day, the criterion 
of five consecutive correct responses 
in 10 trials was met by only 10.5% 
of the group narcotized immediately 
after learning, while 56.2% of the 
subjects paralyzed 4 hours following 
learning met the same criterion. The 
remaining two groups occupied inter- 
mediate positions. Fifty percent of a 
group of untreated control subjects 
also met the above criterion. Thus, 
the temperature induced narcosis 
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produced much the same effect in the 
goldfish that ECS and anoxia have 
been found to produce in rodents. 
Ransmeier and Gerard (1954) did not 
find any evidence of retroactive ef- 
fects of lowered body temperatures 
on retention of a maze habit in the 
hamster. Gerard (1955) has reported, 
however, that lowering the body 
temperature will apparently prolong 
the period during which an ECS may 
produce severe deficits. Thus, “ham- 
sters kept cool between learning and 
electroshock show as great a disrup- 
tion of learning at an interval of one 
hour as warm ones do at an interval 
of fifteen minutes.’’ Evidently, tem- 
peratures sufficient to impair spon- 
taneous activity in the brain as indi- 
cated by the EEG will not act directly 
to block consolidation, although they 
may slow down the chemical pro- 
cesses involved in the fixation of the 
trace. 

Fay (1940) has reported RA in 
human subjects for events occurring 
while the patients were refrigerated, 
i.e., when the body temperature fell 
below 33.3° C. Under these circum- 
stances, the subjects could respond 
to questions and carry on a conversa- 
tion, although interrogation after 
the refrigeration procedure showed a 
loss of memory for the entire inter- 
change. Such deficits could be ex- 
plained in terms of an impairment of 
activity in those structures responsi- 
ble for the consolidation process. 
However, alternative explanations 
are also possible. 


Anesthesia 


Leukel (1957) has reported that 
sodium pentothal injected intraperi- 


toneally (IP) after each learning 
trial impaired acquisition in a maze 
in experimental rats when their time 
or error scores were compared with 
any of three control groups. Subjects 
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in the three control groups received 
either: an IP injection of water fol- 
lowing each trial, an IP injection of 
pentothal 30 minutes following each 
trial, or no injection. The scores did 
not differ among these latter groups. 
Leukel interpreted his results in 
terms of interruption of consolidation 
of the memory trace in those sub- 
jects receiving pentothal one minute 
after each trial. 

On the other hand, Russell and 
Hunter (1937) and Ransmeier and 
Gerard (1954) have not found deficits 
in retention to result from postlearn- 
ing barbiturate anesthesia. There 
are numerous differences in proce- 
dure, however, which might account 
for this discrepancy. For example, 
Russell and Hunter (1937) admin- 
istered sodium amytal subcutane- 
ously after giving their experimental 
subjects five massed trials in a maze. 
They observed no effects of the in- 
jection on subsequent retention of 
the maze. However, the subcutane- 


ous route of injection undoubtedly 
prolonged the time before the drug 
took effect (in comparison with the 


IP route used by Leukel). In addi- 
tion, the massed trials procedure 
used by Russell and Hunter resulted 
in a longer interval between learning 
and anesthesia than the Leukel pro- 
cedure of injecting one minute fol- 
lowing each trial. 

Ransmeier and Gerard (1954) and 
D. Kimura and S.E. Glickman (un- 
published) failed to find retention 
deficits as the result of anesthetizing 
hamsters or rats with ether following 
maze learning trials, or electric shock 
in an avoidance learning situation, 
respectively. These results suggest 
that the apparent effectiveness of 
barbiturates, as opposed to ether, in 
blocking consolidation may be due to 
secondary effects of the former on 
blood chemistry or blood pressure 
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rather than direct synaptic interfer- 
ence. Barbiturate anesthetics pro- 
duce many more severe blood changes 
than ether including reductions in 
blood pressure and blood sugar level 
(Kohn, 1950). 

If anesthetics can be shown to 
exert reliable retroactive effects on 
learning, they may eventually prove 
useful in the localization of the neural 
structures crucial to consolidation. 
Techniques have recently been de- 
veloped which permit the delivery of 
small quantities of various drugs to 
restricted sites within the brain of a 
“behaving’’ animal (Fisher, 1956; 
Olds & Olds, 1958). Utilizing such 
techniques, it should be possible to 
selectively and temporarily block 
activity in various cerebral struc- 
tures during the period immediately 
following exposure to the learning 
situation and thereby determine 
which structures, if any, are crucial 
to the consolidation process. 


Brain Stimulation 


Mahut (1958), Glickman (1958), 
and Thompson (1958b) have reported 
retroactive effects of brain stimula- 
tion on learning. The stimulation was 
accomplished with chronically im- 
planted electrodes which permit the 
animal freedom of movement in the 
learning situation, but enable the 
experimenter to deliver a small elec- 
tric current to particular sites within 
the CNS at any chosen time. This 
technique enables much more specific 
delimitation of the structures in- 
volved in the presumed fixation pro- 
cess than, for example, ECS or 
anoxia. However, in the studies 
carried out thus far, there are nu- 
merous factors which serve to com- 
plicate comparisons among the stud- 
ies, as well as to rule out any simple 
“‘consolidation”’ interpretation of the 
results. 
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Mahut (1958) tested the effects of 
stimulation of the nonspecific tha- 
lamic nuclei on the performance of 
rats in a Hebb-Williams maze. Brief 
bursts of 60-cycle, sine wave, 0.25- 
volt stimulation were delivered 
through implanted electrodes while 
the rat was eating in the goal box. 
Such stimulation produced poorer 
performance in the maze, when the 
error scores of these “‘thalamic’’ rats 
are compared with those of rats re- 
ceiving either no stimulation or simi- 
lar stimulation of the midbrain teg- 
mentum. The possibility exists in 
this study that the effects of stimula- 
tion were not retroactive but con- 
temporary, i.e., interfered with the 
animals’ registration of the food 
reward. This might be clarified by a 
parametric investigation of the time 
interval between learning trial and 
stimulation, following the design of 
the ECS studies (Duncan, 1949; 


Thompson & Dean, 1955). 


Glickman (1958) examined the 
effects of stimulation of the midbrain 
portion of the arousal system on the 
acquisition of an avoidance habit in 
the rat. Three 20-second bursts of 
stimulation, at considerably higher 
voltages than those used by Mahut 
(1958), were delivered immediately 
following shock to the mouth while 
the subjects were eating at a distinc- 
tive metal food spout. In retention 
tests carried out the following day, 
the animals who had received re- 
ticular stimulation after mouth-shock 
showed less avoidance of the spout 
(more eating behavior) than control 
animals not receiving brain stimula- 
tion. The interpretation of this study 
is also complicated due to the par- 
ticular characteristics of the Hudson 
(1950) one-trial learning apparatus 
which evidently lead to a portion of 
the avoidance response being learned 
in the postshock period. Hudson has 
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reported that the visual scanning 
which the animal engages in during 
the postshock period will reinforce 
the avoidance response. Thus, it is 
conceivable that the reticular stimu- 
lation could have simply interfered 
with an ongoing visuai process rather 
than retroactively interfering with 
previous learning. 

Thompson (1958b), in an ingeni- 
ously designed study which permits 
him to use each animal in a variety 
of experimental conditions, has re- 
ported interference with the per- 
formance of cats in an alternation 
task as the result of intracranial 
stimulation. This effect was achieved 
with bilateral stimulation of the 
caudate nucleus following each trial 
in a modified Wisconsin General Test 
Apparatus. Similar stimulation of 
the midbrain tegmentum did not 
produce the retroactive effect, al- 
though it did interfere with perform- 
ance when the stimulation was de- 
livered either before or during a given 
trial. In this case, the interpretation 
of the retroactive disruptive effects 
of caudate stimulation is complicated 
by the possible reinforcing properties 
of this stimulation. Brady, Boren, 
Conrad, and Sidman (1957) have 
reported positively reinforcing con- 
sequences of caudate stimulation in 
the cat. It seems plausible that 
stimulation in this region, following 
a particular response, would favor 
repetition of that response and might 
act in opposition to any alternation 
habit. Such an explanation might be 
an alternative to postulating inter- 
ference with a perseveratory process. 
Since it is possible to check on the 
rewarding properties of electrical 
stimulation, using a self-stimulation 
situation such as that used by Olds 
and Milner (1954), this factor could 
be easily controlled in future studies. 
In regard to the lack of effect of teg- 
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mental stimulation, this may be ex- 
plicable in terms of the extensive 
functional localization of reinforce- 
ment pathways which appears to 
exist in that region (Glickman, 1960; 
Olds & Peretz, 1959). Olds‘ has sug- 
gested that the interference produced 
by intracranial stimulation in learn- 
ing situations may be directly related 
to the reinforcing qualities of the 
stimulation. 

There are numerous studies dem- 
onstrating interference with learning 
as a result of intracranial stimulation 
(see Zeigler, 1957, for review). How- 
ever, most of these are not directly 
interpretable in terms of retroactive 
interference because the stimulation 
is delivered during the actual per- 
formance of the task. Nevertheless, 
as Thompson (1958b) suggests, inter- 
ference with consolidation may be at 
least a partial explanation of the 
deficits observed by Rosvold and 
Delgado (1956) coincident with cau- 
date stimulation. Similarly, Burns 
and Mogenson (1958) and Burns and 
Stackhouse (1959) have reported 
deficits in the acquisition of a bar 
pressing habit in the Skinner Box 
resulting from a cortical stimulation. 
As Burns and Stackhouse note, these 
results are compatible with a per- 
severation hypothesis. 


PHYSIOLOGICAL SUBSTRATE OF 
CONSOLIDATION 


Stellar (1957) has pointed out that 
physiological data have recently ac- 
cumulated which tend to support the 
existence of a system within the brain 
responsible for the permanent fixation 
of memory traces. Milner and Pen- 
field (1955) and Scoville and Milner 
(1957) have reported cases of tem- 
poral lobe ablation in man which 
produced severe impairment of the 


* J. Olds, personal communication, 1959. 
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ability to acquire new material post- 
operatively, although preoperatively 
acquired material was retained. Al- 
though the crucial structures have 
not yet been definitely localized, the 
hippocampus and amygdala appear 
to be directly involved. Along similar 
lines Brady, Schreiner, Geller, and 
Kling (1954) found interfering effects 
of amygdalectomy on the acquisition 
of an avoidance response in cats, al- 
though the same lesions produced in 
cats which had already acquired the 
habit led to no disturbance in per- 
formance. The anatomical and physi- 
ological data suggest numerous path- 
ways through which these relatively 
primitive temporal lobe structures 
could exert widespread effects on the 
remainder of the brain (Adey, Mer- 
rillees, & Sunderland, 1956; Green & 
Adey, 1956). For example, the con- 


tinued action of these temporal lobe 
regions may be necessary to the 
proper regulation of firing in the non- 


specific arousal system, which in turn 
apparently exerts considerable in- 
fluence on cortical activity (Magoun, 
1958). 

The existence of structures within 
the brain which are crucial to the 
fixation of memory traces is not re- 
stricted to the vertebrate orders. 
Boycott and Young (1950) have 
identified a cerebral structure (the 
vertical lobe) requisite for fixation of 
visual memory in the octopus, and 
apparently homologous in function 
to the temporal lobe structures found 
in the higher vertebrates. Thus, 
removal of the vertical lobe drasti- 
cally impairs the ability of the ani- 
mal either to acquire a new visual 
discrimination habit (motivated by a 
combination of food and _ electric 
shock), or to retain such a habit for 
any length of time following training. 
The nervous system of the octopus 
differs widely from the vertebrate 
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nervous system. However, the ap- 
pearance of a specialized fixation 
mechanism in both invertebrates and 
vertebrates suggests that there is 
some evolutionary utility in a dual 
process underlying memory function. 

At a more molecular level, the most 
widespread hypothesis concerning 
the substrate of consolidation predi- 
cates its dependence on reverberatory 
circuits. This idea has its origins in 
the anatomical demonstratiens of 
Lorente de No (1938) and has been 
subscribed to in varying forms by 
Hebb (1949), Young (1953), and 
Gerard (1955). The basic supposi- 
tion is that reverberatory activity 
maintains the memory until the 
permanent changes underlying fixa- 
tion of the trace have been completed. 
This dual process hypothesis of mem- 
ory fixation has the advantage of 
explaining why interference with 
neural activity immediately after 
“learning’”’ blocks retention while 


similar procedures instituted at a 


later time do not. One group of 
studies which may be directly rele- 
vant to the reverberatory circuit hy- 
pothesis of consolidation has been 
carried out by B. D. Burns and his 
co-workers (Burns, 1954, 1958). 
Burns has developed a_ technique 
which allows the isolation of small 
areas of cortex from the remainder of 
the brain, while leaving the blood 
supply to the area relatively unaf- 
fected. He has extensively studied 
the electrical activity of these iso- 
lated slabs in response to direct 
electrical stimulation. Interestingly 
enough, he has found: that a single 
train of pulses can initiate bursts of 
activity in one of these preparations 
lasting for 30 minutes or more; that 
such bursts of activity can be blocked 
by a subsequently applied electrical 
stimulus; that such activity becomes 
easier to evoke with repeated appli- 


229 


cations of the stimulus; and that the 
burst activity is apparently due, in 
part to reverberatory activity among 
groups of neurons, and in part to 
differential rates of depolarization 
within various segments of individual 
neurons. These first three observa- 
tions certainly coincide with what 
one would expect if such a process 
underlay consolidation. However, 
it is necessary to be cautious in gen- 
eralizing from the type of activity 
observed in these special preparations 
to that occurring in the intact brain. 
Burns (1958) himself has rejected 
these preparations as a general model 
for memory on the grounds that such 
circuits would be too susceptible to 
external interference. However, this 
is one aspect of the data which makes 
Burns’ findings so attractive as a 
model of the first phase of a dual pro- 
cess theory of memory, susceptibility 
of learned material to interference 
providing the main behavioral evi- 
dence for the existence of a consolida- 
tion process. 

Finally, moving to a still more 
molecular analysis of the problem, it 
is reasonable to inquire about the 
specific changes which might be pro- 
duced by some sort of perseverative 
process. Nearly all investigators 
have at this level proposed some sort 
of growth process or chemical change 
at the synapse. In this respect, our 
ideas have changed little from those 
of 1929 when Lashley wrote: 

We have today an almost universal acceptance 
of the theory that learning consists of modifi- 
cation of the resistance of specific synapses 


within definite conduction units of the ner- 
vous system. 


After expressing numerous reserva- 
tions about the adequacy of this 
assumption, Lashley concluded by 
noting that: 

The synapse is, physiologically, a convention 
to describe the polarity of conduction in the 





230 STEPHEN E. 


nervous system of higher animals, together 
with some similarities of function in the cen- 
tral nervous system and neuromuscular junc- 
tion. That these functions are due to the ac- 
tion of the intercellular membranes has not 
been directly demonstrated (p. 127). 


Here again, recent neurophysio- 
logical progress tempers Lashley’s 
skepticism. The synapse is no longer 
a “convention” but a point-at-able 
structure which can be photographed 
and studied with the electron micro- 
scope (Palay, 1956). Further, as 
Lloyd (1949) and Eccles (1953) have 
shown, the rapid firing of impulses 
across synaptic junctions can result in 
increased excitability of these syn- 
apses for periods lasting from minutes 
to hours. There is general agreement 
that this increased excitability re- 
sults from the firing of presynaptic 
fibers, although it is not yet clear 
whether this is in turn due to an ac- 
tual change in the dimensions of the 
synaptic knobs as suggested by Ec- 
cles (1953, 1957) or if an alternate 


explanation, e.g., Lloyd (1949), may 


suffice. Eccles (1953) has proposed 
this phenomenon of posttetanic po- 
tentiation as a general model for 
conditioning and memory. Such a 
proposal meets with many difficulties 
(Malmo, 1954). However, there is no 
question that a person ascribing 
learning to changes in synaptic ex- 
citability could do so with more con- 
fidence today than was possible 30 
years ago. 


CONCLUSIONS 


In the opinion of the writer, the 
over-all weight of evidence certainly 
favors the existence of some mecha- 
nism of consolidation (in spite of the 
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fact that alternative explanations 
are possible for many of the experi- 
ments which supposedly support the 
existence of such a process). Further- 
more, the application of available 
physiological procedures appears to 
offer a promising approach to defin- | 
ing the structures involved in the 
fixation of memory traces. The most 
severe problems presented thus far 
have occurred as the result of con- 
founds in the behavioral test situa- 
tions employed, rather than through 
some defect in the modes of physio- 
logical interference. These problems 
are not insoluble, however, and an 
attempt was made to indicate this in 
the text of the paper. 

As a final point, the material re- 
viewed suggests the possibility that 
pseudoneurological speculation, re- 
sulting from strictly behavioral ob- 
servation, can result in productive 
physiological research—when the 
speculation is shrewdly conceived. 
Moreover, the physiologist would ap- 
pear to have already begun to repay 
this debt by suggesting purely be- 
havioral studies or new interpreta- 
tions of behavioral data. The studies 
demonstrating interfering effects of 
visual stimulation interpolated im- 
mediately after visual discrimination 
learning (Thompson, 1957b; Thomp- 
son & Bryant, 1955) are examples of 
such physiologically influenced ‘“‘be- 
havioral”’ investigations. Along simi- 
lar lines, Walker’s (1958) reinterpre- 
tation of reaction decrement, spon- 
taneous alternation data, in terms of 
mechanisms serving to protect con- 
solidation, appears to be equally 
sensitive to current physiological 
research. 
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The term anxiety has enjoyed 
great popularity in the writings and 
researches of psychologists in the last 
decade, and procedures for measuring 
this hypothetical state have pro- 
liferated wildly. There is every indi- 
cation that psychologists will con- 
tinue to develop and employ measures 
of anxiety in many areas of research, 
especially in the rapidly expanding 
number of studies of psychothera- 
peutic process and change, in the 
already booming area of psychophar- 
macology, in studying the effects of 
anxiety on performance, and in at- 
tempts to assess such constructs as 
aggression anxiety or sex anxiety. It 
is the purpose of this paper to first 
impose some restrictions upon the 
definition of anxiety, and then to 
focus upon the problem of assess- 
ment by physiological-behavioral 
measures. No attempt will be made 
in this paper to review the research 
and evaluate the problems associated 
with assessing anxiety by self-report 
techniques. 

One’s theoretical approach to anxi- 
ety affects how one goes about meas- 
uring it; likewise results of attempts 
to assess anxiety should eventually 
modify and help refine the theoretical 
conception of anxiety. Thus the 
initial comments about the nature of 
anxiety should be considered as a 
rough formulation only, with both 
assessment procedures and theory 
modifying each other as investigation 
proceeds. It is recognized that this 
formulation, rough as it is, cannot 

! This paper was prepared in part while the 
author held a visiting appointment at the 
University of North Carolina. The author 


wishes to thank Earl Baughman and Leonard 
Berkowitz for their helpful suggestions. 
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include all that anxiety means to all 
people, and that accordingly to make 
this review manageable, it is neces- 
sary to delimit the concept. 


A CONCEPTION OF ANXIETY 


As a starting point it is proposed 
that the construct of anxiety be con- 
sidered similar and perhaps identical 
to the reaction of fear, the neuro- 
physiological bases for which are not 
completely known but would seem to 
especially involve the functions of 
the posterior hypothalamus and its 
effects upon the sympathetic nervous 
system, the adrenal medulla, and the 
pituitary-adrenocortical system. The 
brain stem reticular formation may 
also play a part in this reaction. It is 


recognized that this is undoubtedly 
an oversimplification of the complex 


and interacting neurophysiological 
mechanisms involved in fear. This 
reaction may be largely innate yet it 
is likely that as a result of learning or 
constitutional predisposition individ- 
uals tend to have variations in the 
manner in which the anxiety reaction 
is expressed. 

It is further proposed that anxiety 
represents only one of many arousal 
states that can be differentiated from 
a more general state of activation as 
arousal becomes more intense. Thus 
the arousal that occurs when a person 
passes from a sleeping or very relaxed 
state to a waking, behaving state may 
be of a fairly generalized sort with no 
specialized affective or motivational 
reactions involved. However, as 
arousal becomes more intense, differ- 
entiation probably occurs and dis- 
tinctive arousal states may emerge 
relating to such constructs as anxiety, 
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anger, hunger, sex, or other emotional 
or motivational states. Although it is 
possible that research will suggest 
the value of distinguishing anxiety 
from fear at the response level, or one 
kind of anxiety from another, it is 
perhaps best to demonstrate the 
utility of one construct of anxiety 
and its distinctiveness from other 
arousal states before adding unneces- 
sarily to the number of theoretical 
constructs extant. 

Anxiety also possesses the property 
of being highly learnable: that is, the 
hypothetical response becomes read- 
ily conditioned to stimuli that do not 
innately elicit the response. This 
characteristic renders difficult if not 
impossible any attempt to define 
anxiety on the basis of stimuli that 
elicit it, since the stimuli that elicit it 
will vary widely from person to per- 
son. An exception would be direct 
electrical stimulation of the brain 
(Miller, 1958), where the effective 
antecedent stimulus might be well 
defined. 

As a consequence of the difficulty 
of approaching the construct of anxi- 
ety from the stimulus side in human 
subjects, the primary emphasis in 
this paper will be to review research 
relevant to the assessment of anxiety 
in terms of response patterns. The 
observable responses from which one 
might infer the strength of the anxi- 
ety reaction are of two basic types: 
physiological-behavioral responses 
and self-report responses. As previ- 
ously mentioned, this paper will be 
primarily concerned with the first 
type of response. 

In addition to the hypothetical 
anxiety state and its observable 
manifestations there are two other 
variables intimately related to anxi- 
ety which are kept conceptually 
distinct in the present view: namely, 
those stimuli (external or internal) 
which elicit the anxiety response, and 
those responses which have been 
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learned because they reduce or avoid 
the anxiety response. From the 
point of view of measurement the 
stimuli that evoke anxiety become 
important only if one wants to know 
what situations or thoughts or feel- 
ings elicit anxiety. Thus the common 
distinction between anxiety and fear 
in terms of the latter being in response 
to a realistic danger and the former 
being a response to unrealistic or un- 
known threats is basically a stimulus 
defined difference and does not neces- 
sarily involve a difference in response. 

There exists a possible source of 
confusion with respect to the re- 
sponses that have been learned to 
reduce anxiety in that clinicians 
frequently infer anxiety on the basis 
of these ‘defenses’ against anxiety 
as much as from direct expression of 
the anxiety itself. Again, from the 
point of view of theory as well as 
measurement it is preferable to keep 
these two variables distinct if pos- 
sible. In fact, it would seem likely 
that when a person is making a suc- 
cessful ‘‘defensive”’ response, no anxi- 
ety is present. To the extent that 
this is so it would be misleading to 
infer the strength of the momentary 
anxiety level from the presence of 
learned anxiety reducing responses. 


THE MEASUREMENT OF ANXIETY 


The foregoing theoretical analysis 
suggests that in spite of individual 
variations in response there might 
still be some pattern of physiological- 
behavioral responses associated with 
anxiety arousal that would be distinct 
from other patterns of response asso- 
ciated with other emotional or arousal 
states. Findings based primarily on 
physiological response patterns will 
be considered first followed by find- 
ings based primarily on behavioral 
response patterns. Two basic ques- 
tions will be asked with respect to 
both the physiological and the be- 
havioral evidence: (a) Does a dis- 
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tinctive pattern of responses emerge, 
tentatively identifiable as reflecting 
anxiety, that can be distinguished 
from other patterns associated with 
other arousal states, when the differ- 
ing arousal states have been experi- 
mentally induced? (6) What is the 
nature of the intercorrelations among 
physiological or behavioral measures 
which have been obtained under the 
same experimental conditions, and is 
there any evidence of a distinguish- 
able cluster of intercorrelated vari- 
ables that might be tentatively 
identified as reflecting anxiety? The 


studies do not always lend them- 
selves to a clear-cut analysis in these 
terms but these are the guiding ques- 
tions being considered. 


Physiological Measures: Experimental 
Comparisons 


The studies of primary interest 
here are those in which an attempt 
was made to distinguish between two 
or more experimentally induced 
arousal states where one of these was 
considered to represent a fear or anxi- 
ety reaction. There are three studies 
that most closely follow this para- 
digm. Ax (1953) reports a study in 
which a variety of physiological 
measures were obtained from normals 
under conditions presented in coun- 
terbalanced order that were designed 
to elicit fear and anger, respectively. 
The fear condition was ingeniously 
contrived to make the subject think 
that the apparatus was faulty and 
that he was in real danger of receiv- 
ing a severe, perhaps even fatal, 
electric shock. Anger was aroused by 
an obnoxious assistant who generally 
insulted and belittled the subject. 
Schachter (1957) repeated Ax’ study 
using hypertensive, potential hyper- 
tensive, and normotensive subjects, 
and added a pain experience (cold 
pressor test) to the fear and anger 
situations. All subjects received the 
treatments in the same order: pain, 
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fear, and anger. Lewinsohn (1956) 
obtained three physiological measures 
plus a measure of finger tremor on 
groups of normals, anxiety reaction 
patients, ulcer patients, and hyper- 
tensive patients subjected in coun- 
terbalanced order to the cold pressor 
test and a failure experience accom- 


“panied by criticism and_ electric 


shock. Another study that is highly 
relevant to the issue but which em- 
ployed a somewhat different research 
strategy is that of Funkenstein, King, 
and Drolette (1957). After stressing 
their college student subjects they 
determined in a poststress interview 
whether a subject had tended to 
experience anger outwardly directed, 
anger inwardly directed, or anxiety. 
The scores obtained were limited to 
blood pressure and _ballistocardio- 
graphic measures. 

The results of these four studies 
are summarized in Table 1. Most 
scores in the Ax and Schachter stud- 
ies represent difference scores be- 
tween prestress resting level and the 
highest (or in some cases the lowest) 
level reached during stress. The 
scores in the Lewinsohn study repre- 
sent differences in the mean during 
rest and the mean during stress, with 
the exception of the GSR score which 
represents the largest deflection dur- 
ing stress. All scores reported from 
the Funkenstein study are percent- 
age changes from prestress levels. 

In spite of some inconsistencies 
among the studies there does appear 
to be evidence for distinguishable 
response patterns that can be tenta- 
tively associated with the constructs 
of fear (anxiety) and anger. Diastolic 
blood pressure increased more for 
anger than fear in all three studies in 
which fear and anger states were 
thought to be aroused (significantly 
different from chance in two studies). 
Heart rate increased more in fear 
than anger in all three studies (signifi- 
cant in two). Maximum heart rate 
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TABLE 1 
COMPARISON OF PHYSIOLOGICAL MEASURES ASSOCIATED WITH 


DIFFERENT EMOTIONAL AROUSAL STATES IN Four STUDIES 





Schachter 


Funkenstein 





Measure 


Systolic blood pressure 

Diastolic blood pressure 

Heart rate (+) 

Heart rate (—) 

Cardiac output 

Peripheral resistance 

Hand temperature (—) 

Palmar conductance 

Largest deflection in stress, 
GSR 


No. GSRs 

Respiratory rate 
Frontalis muscle tension 
No. muscle potential peaks 
Finger tremor 

Salivary output 











* Schachter used the transformation, log 1/(Ri —Ra), 


Anger 


Fear Anger-out 


19.6%* 


024*| 
2.33%! 


| 
| 
| 





118 
7.9 | 


where R; =initial resistance and R:=lowest resistance 


during stress. The smallest negative number, —1.99, for fear accordingly refers to the largest decrease in resistance. 
* Significant at the .05 level; for Schachter this is based on an overall analysis of variance for the three conditions. 


decrease was significantly greater in 
anger than fear in the one study in 
which it was reported. Cardiac out- 
put increased significantly more in 
fear than anger in the two studies 
in which it was reported, and periph- 
eral resistance decreased significantly 
more in fear than anger in both stud- 
ies where it wasreported. Palmar con- 
ductance increased significantly more 
in fear than anger in the two studies 
where it was reported. Number of 
discrete GSRs, however, was signifi- 
cantly higher in anger than fear in the 
one study where this was measured. 
Respiration rate increased signifi- 
cantly more in fear than anger in the 
two studies reporting this measure. 
Frontalis muscle tension increased 
more in fear than anger in the two 
studies measuring it (significant in 
one). 

Another study that was not in- 
cluded in the tabular presentation 
provides additional support for the 
different heart rate responses associ- 
ated with anxiety and anger. Dzi- 
Mascio, Boyd, and Greenblatt (1957) 
studied one psychotherapy patient 
over 11 interviews and found a cor- 
relation (rho) of .69 between average 
heart rate and amount of rated ten- 
sion (anxiety?) in the interviews, and 


a correlation of —.37 between aver- 
age heart rate and amount of rated 
antagonism in the interviews. 

The two studies in Table 1 involv- 
ing a painful experience, the cold 
pressor test, suggest that this arousal 
state may also be distinguishable 
from fear, although the differentia- 
tion of pain and anger is less clear. 
It is, of course, not possible to know 
from these results how specific these 
reactions might be to the cold pressor 
test as opposed to pain stimulation 
generally. 

Funkenstein et al. (1957) propose a 
theory that may serve to provide 
some integration for these various 
findings. They suggest that the 
physiological reaction accompanying 
anger-out is a norepinephrine-like 
reaction and that accompanying anxi- 
ety is an epinephrine-like reaction. 
The physiological reactions accom- 
panying injections of epinephrine 
and norepinephrine have been in- 
vestigated by Goldenberg, Pines, 
Baldwin, Greene, and Roh (1948), 
Barcroft and Konzett (1949), De- 
Largy, Greenfield, McCorry, and 
Whelan (1950), Goldenberg (1951), 
Swan (1952), and Clemens (1957). 
In general it is found that epineph- 
rine leads to increased palmar con- 





238 


ductance, systolic blood pressure, 
heart rate, cardiac output, forehead 
temperature, central nervous system 
stimulation, blood sugar level; and 
decreased diastclic blood pressure, 
peripheral resistance, hand tempera- 
ture, and salivary output. Norepi- 
nephrine leads to increased systolic 
and diastolic blood pressure and pe- 
ripheral resistance, no change or a 
slight decrease in heart rate and 
cardiac output, and only slight in- 
creases in central nervous system 
stimulation and blood sugar level. 

It is generally thought that reac- 
tions associated with norepinephrine 
are more limited, possibly restricted 
to peripheral vasoconstriction result- 
ing from secretion at the sympathetic 
nerve endings, than are the reactions 
to epinephrine. However, no studies 
were found in which the effects of 
injected norepinephrine upon a wide 
range of responses including palmar 
conductance, hand or finger tempera- 
ture, respiration rate, salivary out- 
put, or muscle potentials were as- 
sessed. In terms of the measures that 
have been obtained under both kinds 
of hormonal injections (Barcroft & 
Konzett, 1949; DeLargy et al., 1950; 
Goldenberg et al., 1948), heart rate, 
diastolic blood pressure, cardiac out- 
put, and peripheral resistance ap- 
pear to be the most discriminating. 
Neither cardiac output nor periph- 
eral resistance is readily obtainable 
by direct measurement. Cardiac out- 
put is usually inferred from ballisto- 
cardiographic measures, and periph- 
eral resistance is usually estimated by 
dividing mean arterial blood pressure 
by cardiac output. 

Funkenstein et al. (1957) divided 
their subjects into subgroups on the 
basis of epinephrine-like, norepi- 
nephrine-like, and indeterminate re- 
actions and found a highly signifi- 
cant relationship in the expected 
direction between these physiological 
reaction types and the tendency to 
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respond by anger-out as opposed to 
anxiety. Schachter (1957) making 
use of a greater variety of physiologi- 
cal measures likewise computed an 
index of epinephrine- and norepi- 
nephrine-like reactions and found 
these indices to vary significantly as 
a function of the pain, anger, and 
fear conditions with pain showing the 
most norepinephrine-like reaction 
and fear the most epinephrine-like re- 
action with anger falling in between. 

Although it would be premature to 
conceptualize the anxiety reaction as 
being entirely defined by the results 
of epinephrine secretion, the distinc- 
tion between the epinephrine- and 
norepinephrine-like reactions may 
well be an important one for anxiety 
measurement. The secretion of 
epinephrine and norepinephrine from 
the adrenal medulla and the release 
of norepinephrine at the sympathetic 
nerve endings are all affected by 
sympathetic nervous system stimula- 
tion. The fact that these two hor- 
mones produce quite different reac- 
tions points up what has long been 
known: namely, that it is a great 
oversimplification to speak of sym- 
pathetic arousal as if it were a uni- 
tary function. Although the response 
pattern associated with experimen- 
tally induced anxiety conforms rather 
closely to the response pattern asso- 
ciated with epinephrine injection, 
the response pattern associated with 
anger is not as closely related to the 
responses produced by norepineph- 
rine injection. Perhaps the distinc- 
tion between anxiety and anger, at 
the humoral level, is one involving 
the relation of epinephrine to norepi- 
nephrine in which anxiety is associ- 
ated with a purer epinephrine-like 
reaction and anger with a mixed 
pattern of epinephrine and norepi- 
nephrine responses. 

There are other studies where one 
or two physiological measures have 
been obtained under conditions likely 
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to arouse anxiety. For example, 
Hickham, Cargill, and Golden (1948) 
found heart rate and cardiac output 
to increase substantially in medical 
students before what was considered 
to be an anxiety arousing situation, 
an oral examination, as compared to 
more relaxed conditions a month 
later. Likewise, Malmo, Boag, and 
Smith (1957) report increased heart 
rate in neurotic subjects after criti- 
cism as compared with decreased 
heart rate after praise. Although 
studies of this kind tend to be con- 
sistent with the previously described 
studies, they do not shed additional 
light on the question of whether some 
pattern of response related to anxiety 
can be differentiated from patterns 
of response associated with other 
kinds of arousal states. 

Davis (1957), Davis and Buchwald 
(1957), and Davis, Buchwald, and 
Frankman (1955) also report evi- 
dence that different stimuli elicit 
distinctive autonomic response pat- 
terns. There is no reason to believe, 
however, that any of their stimuli, 
for example, pictures of nudes, land- 
scapes, etc., were likely to evoke 
anxiety in many of their subjects. 
These studies do point to the possible 
subtleties in autonomic patterns as- 
sociated with various kinds of stimu- 
lation or arousal states, and caution 
against any too ready acceptance of 
some particular pattern as being the 
anxiety or the anger pattern. All of 
the studies described thus far, though, 
are consistent with the possibility 
that some pattern of physiological 
measures may allow one to infer the 
magnitude of the hypothetical anxi- 
ety reaction differentially from other 
hypothetical states such as anger or 
pain. 

Physiological Measures: Group Com- 
parisons 


There is a host of studies in which 
physiological measures are con- 
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trasted between normals and various 
clinical groups presumed to be in gen- 
eral more anxious than the normals. 
The studies that will be considered 
here are those involving patient 
groups in which the presence of mani- 
fest anxiety was reported to be a 
prominent part of the symptom pic- 
ture; accordingly, much of the physi- 
ological research on such psychoso- 
matic disorders as hypertension and 
peptic ulcer will not be summar- 
ized. 

Sherman and Jost (1942) found 15 
neurotic children to have lower rest- 
ing level palmar conductance than 18 
well adjusted children, but more 
resting level hand tremors, lower per- 
centage of alpha rhythm in the EEG, 
and faster respiration rate than well 
adjusted children. No differences 
were found for heart rate or blood 
pressure. Although measures were 
taken in a series of seven conditions, 
the results described above appeared 
to represent differences in general 
level rather than different degrees of 
reaction to the various conditions. 
Jurko, Jost, and Hill (1952) obtained 
measures on 25 normals, 20 neurotics, 
and 10 schizophrenics (all adults) 
while administering the Rosenzweig 
P-F test, and found heart rate, 
respiration rate, and respiration vari- 
ability higher in patient than normal 
groups before and during the test 
administration. A body movement 
score was highest for the _ schizo- 
phrenics and lowest for the normals. 
Palmar conductance was again found 
to be inconsistent with the general 
pattern, being highest for the nor- 
mals and lowest for the schizophrenics 
before and during test administra- 
tion. In neither of these two studies, 
however, was any attempt made to 
restrict the sample of neurotics to 
patients in which anxiety was the 
most prominent symptom. 

GSR conditioning rate, on the 
other hand, has been found to be 
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faster in more anxious subjects 
(Bitterman & Holtzman, 1952; Schiff, 
Dougan, & Welch, 1949; Welch & 
Kubis, 1947). 

White and Gildea (1937) found 
that patients in which anxiety was a 
prominent symptom showed greater 
heart rate increases to the cold pressor 
test than did normals. On the surface 
such a finding appears contradictory 
to the results of Schachter (1957) in 
which the physiological responses 
associated with the cold pressor test 
were clearly distinguishable from 
those associated with anxiety. White 
and Gildea, however, obtained meas- 
ures during a rest period, during a 
brief anticipation period in which the 
experimenter moved the dish of ice 
water close to the subject, and during 
the immersion itself. For the normal 
group the average heart rates for 
these three periods were 75.7, 81.5, 
and 80.0, respectively; and for a 
group of anxiety neurotics 81.0, 90.0, 
and 95.5, respectively. Clearly, it 


was the anticipation of the experience 
that led to increased heart rate for 
the normals, not the pain experience 
itself. The anxious patients likewise 
showed their greatest increase during 


anticipation. These results suggest 
that anticipation of the cold pressor 
test is anxiety arousing, and might 
yield a different pattern of response, 
in normals at any rate, than the pain 
experience itself. 

The above results of White and 
Gildea as well as the results of 
Schachter (1957) and Lewinsohn 
(1956) argue against the theoretical 
formulation of Mowrer (1939) that 
anxiety (fear) is the conditioned 
form of the pain reaction. 

In the Lewinsohn (1956) study 
previously mentioned, resting level 
palmar conductance was highest for 
the anxiety reaction group and lowest 
for the ulcer group, with normals and 
hypertensives falling in between. 
Resting level salivary output was 


BARCLAY MARTIN 


highest in the ulcer group, and lower 
and about the same for the other 
three groups. Somewhat surprisingly, 
resting level heart rate was lowest for 
the anxiety group. The change scores 
showed no particular tendency to be 
associated with the diagnostic groups. 
Wishner (1953) found resting level 
heart rate to be higher in 11 anxiety 
neurotics than in 10 normals and a 
tendency, not significant, for respira- 
tion rate to be faster in the neurotics. 
Funkenstein, Greenblatt, and Solo- 
mon (1951, 1952) conclude that pa- 
tients with anxiety and depressive 
symptoms are manifesting a chronic 
epinephrine-like reaction, whereas 
patients with paranoid tendencies or 
who are otherwise directing their 
anger and blame upon the external 
world are manifesting a chronic 
norepinephrine-like reaction. Their 
conclusions are based primarily on 
the patients’ reactions to the mech- 
olyl test (Funkenstein, Greenblatt, 
& Solomon, 1950). 

Malmo (1950, 1957) has summar- 
ized his research with respect to 
physiological measures found to dis- 
criminate between normals and pa- 
tients with pathological degrees of 
anxiety. In his 1957 article he con- 
cludes that anxious patients show 
greater reactivity in many measures 
regardless of the kind of stress used. 
Thus, Malmo and Shagass (1949a) 
using a painful thermal stimulation 
of the forehead as their stress found 
anxiety neurotics and early schizo- 
phrenics to show more finger move- 
ments, greater neck muscle poten- 
tials, more head movements, more 
respiratory irregularities, and greater 
heart rate variability than normal 
controls. Percent change of the GSR 
showed no significant relationship. 
These results have been generally 
borne out in other studies using dif- 
ferent stresses: Malmo, Shagass, and 
Davis (1951); Malmo, Shagass, Be- 
langer, and Smith (1951). The results 
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of Malmo and Smith (1955) suggest 
frontalis muscle tension may be a 
more sensitive discriminator between 
normals and anxiety neurotics than 
forearm muscle tension. 

Wenger (1948) using considerably 
larger Ns than most investigators 
compared resting state physiological 
measures of 225 patients with the 
diagnosis of operational fatigue, 98 
hospitalized psychoneurotics, and a 
normative group of 488 unselected 
preflight students in the Army Air 
Force. The 10 measures that signifi- 
cantly discriminated between the 
operational fatigue group and the 
normal group were salivary output, 
palmar conductance, systolic and 
diastolic blood pressures, sinus ar- 
rhythmia, heart period, sublingual 
temperature, finger temperature, res- 
piration period, and tidal air mean. 
The operational fatigue group showed 
sympathetic dominance on all of the 
above measures except sublingual 
temperature. For 47 patients in the 


operational fatigue group Wenger 
obtained repeat measures on most 
variables at a later time when they 
were considered improved and ready 
to return to duty. Of the 20 variables 


tested only palmar conductance, 
heart period, and finger temperature 
showed significant changes, and these 
were all in the direction of lessened 
sympathetic arousal. The results 
with respect to the hospitalized psy- 
choneurotics, although not yielding 
exact correspondence on_ specific 
measures, also showed a strong sym- 
pathetic dominance for this clinical 
group. 

Gunderson (1953) obtained 12 
resting state autonomic measures, 
selected on the basis of Wenger's 
previous work, on a sample of 110 
early schizophrenics with an average 
length of hospitalization of about 2 
years. Nine measures—salivary out- 
put, dermographic latency, dermo- 
graphic persistence, systolic blood 
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pressure, diastolic blood pressure, 
finger temperature, heart rate, respi- 
ration rate, and sublingual tempera- 
ture—were significantly different 
from Wenger’s normative group of 
aviation cadets, and with the excep- 
tion of sublingual temperature all 
were in the direction of greater sym- 
pathetic arousal. Palmar conduct- 
ance failed to discriminate and was, 
in fact, almost identical for the two 
groups. This schizophrenic sample 
also showed significantly greater sym- 
pathetic arousal in seven of these 
measures than Wenger’s neurotic 
group. As Gunderson points out this 
indication of greater anxiety in the 
schizophrenic group may well not 
exist in more chronic patients. Gun- 
derson also divided the schizophrenic 
subjects into those that had improved 
the most and least with shock ther- 
apy and found the most improved 
group to show less general sympa- 
thetic arousal as measured by 
Wenger's autonomic balance score, 
the conclusion being that improve- 
ment had been accompanied by a 
decreased arousal. 

There are difficulties involved in 
comparing these studies in which 
anxiety is assumed to be present by 
virtue of a psychiatric diagnosis with 
those in which anxiety was produced 
experimentally. For example, if 
anger or annoyance does involve a 
distinctive arousal state and if such a 
state is present more often in some of 
these patient groups than in normals, 
a not unlikely assumption, then the 
pattern of mean scores may reflect a 
mixture of anxiety and anger as well 
as other arousal states. Nevertheless, 
many measures which belong to the 
epinephrine-like pattern of reaction 
are found to consistently discrimi- 
nate, with an occasional exception, 
between anxious patients and nor- 
mals. By and large it would~ppear™ 
that so-called resting state measures 
discriminate between the patients 
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and normals as well, and in some 
cases better, than do change scores 
associated with experimental stress. 
Some of the studies reporting change 
score results may be misleading since 
in most cases the patient groups 
start out with higher initial level 
scores. The high negative correlation 
between initial level and the magni- 
tude of the change score that prevails 
for most autonomic measures might 
well obscure some real differences 
that would have emerged if this cor- 
relation had been partialed out by 
some procedure such as _ Lacey’s 
(1956) autonomic lability score. 

It is also possible that the particu- 
lar pattern of autonomic responses 
associated with an immediate threat 
situation is different from the “steady 
state” pattern of more chronically 
elevated responses found in many 
psychiatric patients. It is interesting 
in this regard that Wenger (1957) 
in recent pattern analyses of the data 
in his various samples reports not 
only patterns of sympathetic and 
parasympathetic dominance but a 
pattern composed of a mixture of 
sympathetic and parasympathetic 
type of responses. This latter pat- 
tern, which Wenger calls the B pat- 
tern, consists of three sympathetic 
type tendencies, high heart rate, 
high systolic blood pressure, and low 
salivary output; and two character- 
istics of parasympathetic innervation 
or lack of sympathetic arousal, high 
finger temperature and low palmar 
conductance. The sympathetic pat- 
tern occurred more frequently in 
neurotic and schizophrenic samples 
than in the normal group, but not 
more frequently in the operational 
fatigue or a psychosomatic sample 
than in the normal group. The B 
pattern occurred more frequently in 
all of the four psychiatric groups than 
in the normal group. Perhaps this B 
pattern represents a more chronic 
result of psychological stress which 
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could be distinguished from the anxi- 
ety state as presently conceived. 
Such an interpretation is consistent 
with the common clinical view that 
psychosomatic symptoms frequently 
serve an anxiety reducing function. 
It is also noteworthy that the findings 
of Sherman and Jost (1942) and 
Jurko, Jost, and Hill (1952) of low 
resting level palmar conductance ina 
pattern otherwise suggestive of sym- 
pathetic activation in neurotic pa- 
tients is consistent with the existence 
of Wenger’s B pattern. 

To carry speculation a bit further 
in this area, it may be that there are 
systematic differences in response 
patterns as a function of the chronic- 
ity of the stress, as suggested by 
Selye (1950). Thus the pattern(s) of 
immediate change scores associated 
with discrete stimuli (electric shock 
or a threatening word) may be differ- 
ent from the pattern(s) of response 
associated with a stress of longer dura- 
tion but still essentially temporary 
or situational (oral examination, the 
general situation in an electric shock 
experiment, or an appointment for a 
first psychotherapy hour), where the 
change scores would have to be based 
upon measures obtained at some 
more relaxed time. And both of the 
above kinds of patterns might differ 
from patterns of response resulting 
from stress continuing over months or 
years as would be the case with psy- 
chiatric patients. The distinctive 
characteristics of responses associated 
with the second as opposed to the 
first type of stress may result from 
humoral effects being added to the 
more direct and shorter latency 
effects of autonomic nervous system 
stimulation. 

There have been several other ap- 
proaches to the physiological assess- 
ment of anxiety employing measures 
less readily obtainable and also less 
amenable to continuous recording 
than most of the ones considered 
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above. Ulett, Gleser, Winokur, and 
Lawler (1953) and Shagass (1955b) 
report that the EEG of anxious pa- 
tients can be more readily “driven” 
at higher frequencies than is the case 
for normals or less anxious patients. 
There was no tendency for the aver- 
age undriven alpha frequency to be 
different for the groups. Shagass 
(1955a) further reports that changes 
in the driven EEG frequency corre- 
spond to changes in anxiety level for 
the same person measured at different 
times. 

Sedation threshold is also reported 
by Shagass (1954) and Shagass and 
Naiman (1955) to be related to anxi- 
ety level in patients. Basowitz, 
Persky, Korchin, and Grinker (1955) 
find more hippuric acid in the urine 
of paratrooper trainees assessed to be 
anxious than those not anxious, and 
also more in anxiety neurotics than in 
normals. 


Physiological Measures: Intercorrela- 
tions 


On the basis of the research just 
summarized one might assume that 
many of the measures found to be re- 
lated to experimentally induced or 


clinically assessed anxiety would 
show substantial intercorrelations. 
Research thus far gives little ground 
for optimism that these variables will 
correlate very highly, if at all. How- 
ever, it should be pointed out that 
there are few researches that provide 
much direct evidence on the ques- 
tion: namely, correlations among 
changes in measures obtained under 
resting and a clearly fear or anxiety 
arousing situation. Ax (1953) inter- 
correlated the seven physiological 
change scores that significantly dis- 
criminated between the fear and 
anger conditions. The intercorrela- 
tions of these scores under the anger 
condition tended to be higher than 
for the fear condition. The correla- 
tions were for the most part insignifi- 
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cant for fear. Schachter (1957) did 
not report intercorrelations among 
his measures but did find significantly 
more variability among the measures 
under fear than anger. Lewinsohn 
(1956) likewise reported intercorrela- 
tions among his four variables for 
base level scores, for change scores to 
the cold pressor test, and for change 
scores to the failure-criticism condi- 
tion. Only a few correlations were 
significant, probably not more than 
could have occurred by chance. Terry 
(1953) intercorrelated a number of 
physiological change scores associ- 
ated with doing arithmetic problems 
under distracting noise conditions. 
The intercorrelations between differ- 
ent autonomic systems were very 
low and for the most part insignifi- 
cant. Only measures of closely re- 
lated functions, such as systolic and 
diastolic blood pressure, correlated to 
any degree. It is possible that the 
stress condition was not particularly 
anxiety arousing for most subjects. 

Sherman and Jost (1942) in con- 
trast to the above studies did find a 
number of significant correlations 
among their physiological variables 
for neurotic and normal children 
combined. Although their correlation 
matrix is based on a mixture of abso- 
lute level scores, percent change 
scores, and scores obtained at differ- 
ent points in a sequence of seven 
conditions, there does seem to be a 
cluster of fairly highly intercorre- 
lated variables suggesting some 
arousal dimension. The measures 
most highly intercorrelated are hand 
tremor, percent heart rate change, 
percent alpha dominance (negative 
correlations), and respiratory vari- 
ability. Weybrew (1959) intercorre- 
lated 12 physiological change scores 
and 4 personality ratings. The physi- 
ological measures were obtained be- 
fore and after the subjects were 
subjected to a standardized situa- 
tional stress. Correlations were in 
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general low among the physiological 
change scores, and the results of a 
factor analysis were not easy to 
interpret. 

There are just not enough studies 
with enough significant correlations 
between change scores to attempt 
any generalizations from the results. 
A general problem encountered in 
working with autonomicchange scores 
is with respect to the type of trans- 
formation, if any, to use. Correla- 
tions, for example, among Lacey’s 
(1956) autonomic lability scores 
would appear to provide a more 
meaningful picture of the tendency of 
measures to covary than would be 
obtained by using absolute change, 
percentage change, or most other 
transformations, since as previously 
mentioned Lacey’s score more ade- 
quately partials out the usual high 
negative correlation between change 
and initial level. The degree to which 
correlations among autonomic change 
scores can be affected by partialing 
out the correlation with initial level 
is shown in the results of Mandler 
and Kremen (1958). They intercor- 
related scores obtained under a failure 
stress condition from five different 
response systems (GSR, heart rate, 
respiration, face temperature, and 
blood volume) including in some cases 
absolute change scores along with 
Lacey’s autonomic lability score. 
Absolute heart rate change yielded a 
correlation of .27 with change in 
respiration rate, whereas heart rate 
with initial level partialed out yielded 
a correlation of —.17; or in another 
case absolute heart rate change cor- 
related only .02 with inspiration 
amplitude (with initial level of inspi- 
ration amplitude partialed out) but 
heart rate with initial level partialed 
out correlated .:i with the same 
measure. It is cle. r that correlations 
among autonomic measures will be 
greatly affected by the way in which 
the relation to initial level is handled. 
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The findings of Lacey (1950), 
Lacey and Van Lehn (1952), Lacey, 
Bateman, and Van Lehn (1953), even 
though based on stressors that for the 
most part cannot be accepted as 
clearly anxiety arousing, provide such 
a strong argument for individual pat- 
terns of autonomic response that they 
should not be ignored in this context. 
Using various samples (college stu- 
dents and mothers of children in the 
Fel's longitudinal research program) 
and various stressors (cold pressor 
test, hyperventilation, mental-arith- 
metic, and word fluency), Lacey ét al. 
(1953) find that different subjects 
have different patterns of autonomic 
response which are reproducible over 
time and are consistent over these 
different stressors. Thus one subject 
may respond to the stress by a large 
increase in heart rate and only a small 
increase in skin conductance and an- 
other may respond with the opposite 
pattern. To the extent that such find- 
ings can be generalized to a clearly 
fear arousing situation the conclusion 
is clear that one cannot expect inter- 
correlations among autonomic change 
scores to be very substantial. The 
point to be emphasized here, how- 
ever, is not that several autonomic 
measures might not for almost all 
people increase under anxiety arous- 
ing circumstances, but that those 
measures which show the most or 
least increase vary from person to 
person. Such a state of affairs is not 
necessarily disasterous to one inter- 
ested in using physiological measures 
in assessing anxiety. The moral, how- 
ever, remains clear that for a given 
individual some physiological meas- 
ures may be much more sensitive 
indicators of change in anxiety level 
than others. 

A somewhat similar point of view 
is espoused by Malmo, Shagass, and 
Davis (1950) in which they propose 
the principle of symptom specificity: 
namely, that psychiatric patients 
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are inclined to respond to stress of all 
kinds by a particular physiological 
mechanism that leads to the par- 
ticular kind of somatic complaint 
that the patient may have. Thus, 
Malmo and Shagass (1949b) found 
that patients with heart complaints 
showed greater heart rate and heart 
rate variability under stress than 
patients without heart complaints. 
Specificity of muscle potential reac- 
tion was demonstrated by Malmo, 
Smith, and Kohlmeyer (1956) who 
showed that for the same patient 
discussion of hostility conflicts was 
associated with increased forearm 
muscle tension and discussion of sex 
conflict was associated with increased 
leg muscle tension. 

There are other studies in which 
intercorrelations among a number of 
physiological measures are reported, 
such as Wenger (1942, 1948) or 
Gunderson (1953) in which all meas- 
ures were obtained under resting 
conditions. If people manifest vary- 


ing degrees of an autonomic response 
pattern determined by the amount of 
anxiety that they “bring into’’ the 
resting situation then such a pattern 
should show up as a cluster of inter- 
correlated variables. Wenger’s (1942) 


earlier factor analytic work with 
children did yield a dimension that he 
called the autonomic factor, which 
when unbalanced in the sympathetic 
direction would appear to be similar 
to the cluster of autonomic measures 
associated with experimentally 
aroused anxiety in the previously 
described _ studies. However, in 
Wenger’s (1948) study of aviation 
cadets, operational fatigue patients, 
and neurotic patients the case for a 
clear-cut autonomic factor is shaky. 
The most striking thing about the 
reported intercorrelations is_ their 
extremely low level. Very few corre- 
lations are higher than .15. Gunder- 
son (1953), however, reported inter- 
correlations among his 12 resting 
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state measures on a subsample of 
44 paranoid schizophrenics that were 
both substantial, for this kind of 
data, and pervasive. There was a 
tendency for many of the different 
autonomic measures to correlate be- 
tween .20 and .45 in a direction con- 
sistent with degree of sympathetic 
arousal. 

In summary, _ intercorrelations 
among physiological measures ob- 
tained under either resting states or 
under stress tend to be low and fre- 
quently insignificant. There are few 
studies, however, in which a variety 
of measures are obtained under a 
clearly fear arousing situation and 
where the tendency of change scores 
to correlate with initial level has been 
partialed out. Improved measure- 
ment technique may also make some 
of the older studies somewhat obso- 
lete. Nevertheless, the best guess on 
the basis of present findings is that 
intercorrelations among physiological 
measures will be found to be low even 
with the above-mentioned modifica- 
tions taken into account. Lacey’s 
work suggests, consistent with the 
findings of low intercorrelations, that 
an individual responds to stress with 
a characteristic pattern of responses. 
This finding may not be entirely 
inconsistent with the possibility of 
there being some pattern of response 
usually associated with fear. For 
example, Lacey’s findings that sub- 
jects showed different response pat- 
terns to the stress of doing mental 
arithmetic may result in part from 
the fact that some subjects were made 
angry in the situation and some were 
made anxious, and that those that 
were made anxious showed a distinc- 
tive pattern from those made angry 
as Funkenstein et al. found. The 
chances are that this explanation 
does not account for all the individual 
response patterns, and it may be that 
among subjects made anxious there 
still remain different response pat- 
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terns. The meaning of these different 
response patterns, which could be 
few in number, may be clarified by 
further knowledge about their cor- 
relation with behavioral and perhaps 
self-report type measures. It is, of 
course, possible that future factor 
analytic or pattern analysis studies 
will suggest the utility of conceptual- 
izing several different kinds of anxiety 
states. 


Behavioral Measures: Experimental 
and Group Comparisons 


The same question is asked here as 
was asked with respect to physiologi- 
cal measures; is there some pattern of 
behavioral effects associated with 
anxiety that can be distinguished 
from behavioral effects resulting from 
other arousal states? The researches 
most relevant to the question are those 
in the general area of the effects of 
stress on performance. These re- 


searches, unfortunately, do not pro- 
vide a clear answer to the question 


because of two major lacks. First, 
most such studies tend to be limited 
to one dependent variable for the good 
reason that it is much more difficult 
to measure simultaneously a variety 
of appropriate behavioral responses 
than physiological responses. Second, 
few studies attempt to contrast a fear 
arousal state with other kinds of 
arousal states. Another general draw- 
back to most behavioral measures for 
the purposes of assessment, as will be 
shown in the studies reviewed, is that 
their relation to the anxiety con- 
struct is not a monotonic one; for 
example, a low score on a certain 
performance may be associated with 
a very low or very high state of anxi- 
ety. The studies mentioned below, 
then, can be seen as only suggestive 
of measures likely to be especially 
sensitive to the effects of anxiety, 
and are not intended to represent an 
extensive coverage of the research on 
the effects of stress on performance. 
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Summaries of research in this area 
are provided by Hanfmann (1950), 
Lazarus, Deese, and Osler (1952), 
and more recently Easterbrook 
(1959). 

A loose empirical generalization 
that emerges from studies in this 
area is that the kinds of tasks most 
likely to be affected by stress are 
learning and memory tasks involving 
novel or relatively poorly learned re- 
sponses where incorrect competing 
responses are both numerous and 
relatively strong; or perceptual tasks 
in which conditions are imposed that 
make appropriate discriminations dif- 
ficult. Thus, failure stress (usually 
produced by first ego involving, then 
failing, and then criticizing the sub- 
ject) has been shown to impair digit 
span but not vocabulary items (Mol- 
dowsky & Moldowsky, 1952); impair 
recall of incidental learning but not 
recall of material explicitly instructed 
to be learned (Aborn, 1953); and im- 
pair relearning of a serial list of non- 
sense syllables (Smith, 1954). Stress 
imposed by implying that the subject 
is neurotic or maladjusted on the 
basis of projective test responses has 
been found to impair performance on 
abstract reasoning, the Holsopple 
Concept Formation Test, and mirror 
tracing (Beier, 1951); and to produce 
more perseveration of incorrect re- 
sponses on the Luchins Water Jar 
Task (Cowen, 1952). 

A number of studies in which anxi- 
ety is introduced by separating sub- 
jects into high and low anxiety groups 
on the basis of the Taylor MAS (1953) 
provide evidence not only that the 
detrimental effect of anxiety becomes 
greater as the strength and number of 
incorrect competing responses in- 
volved in the task increases, but also, 
for the levels of anxiety involved, 
that performance is enhanced for the 
high anxiety subjects on some tasks 
when the correct response is very 
dominant. The incorrect competing 
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responses are usually introduced by 
increasing the similarity and some- 
times also by decreasing the associa- 
tion value of items in a serial learning 
task, or by both increasing intralist 
similarity and decreasing similarity 
between pairs in a paired associate 
learning task. Lucas (1952), Mon- 
tague (1953), Farber and Spence 
(1953), Lazarus, Deese, and Hamilton 
(1954), Taylor and Chapman (1955), 
Spence, Farber, and McFann (1956), 
and Spence, Taylor, and Ketchel 
(1956), all reported evidence for this 
relationship. The findings of greater 
ease of eyeblink conditioning in 
groups of high anxious as opposed to 
low anxious subjects (Spence & 
Farber, 1953; Spence & Taylor, 1951; 
Taylor, 1951) are also consistent with 
this general proposition. 

Lucas (1952) also studied the effect 
of experimentally induced failure 
upon performance as a function of the 
strength of the incorrect competing 


responses (manipulated by varying 
the number of duplications of conso- 
nants in a series of consonants being 
used in an immediate recall task). 
He found no main effect associated 
with number of duplications nor any 
interaction with four degrees of ex- 


perimentally induced failure. No 
other studies were found in which 
anxiety was induced experimentally 
and its effect upon performance 
studied where the strength of the in- 
correct responses was systematically 
varied within the confines of the 
same task. 

A few studies have made use of real 
life stress situations that probably 
meet the need for a really anxiety 
arousing condition better than the 
experimental procedures used in the 
other studies. Beam (1955) obtained 
measures before doctoral oral exami- 
nations and opening night perform- 
ances in plays as well as at a less 
stressful period in the subject’s life, 
and found marked impairment in 
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learning a serial list of nonsense 
syllables, and an increase in palmar 
sweat and GSR conditioning rate 
under stress as compared to nonstress. 
Basowitz et al. (1955) reported a 
tendency for digit span to be im- 
paired for soldiers undergoing para- 
troop training as compared to a con- 
trol group, and Wright (1954) like- 
wise found impairment in digit span 
in patients confronted with the threat 
of surgery. 

One kind of behavioral measure 
that would appear promising from an 
assessment point of view is speech 
disturbance. Mahl (1956, 1959) has 
developed a system for reliably scor- 
ing speech disturbances of various 
kinds and has shown certain of these 
disturbances to be related to varia- 
tion in anxiety as assessed in psy- 
chotherapeutic interviews. Dibner 
(1956) has employed a similar meas- 
ure. 

In the perceptual area Postman 
and Bruner (1948) reported impair- 
ment in the tachistoscopic perception 
of three-word sentences under failure 
stress. Rosenbaum (1953) found 
greater stimulus generalization under 
strong shock than weak shock. 
Smock (1957) reported greater in- 
tolerance of ambiguity in a percep- 
tual task under stress than nonstress. 
Korchin and Basowitz (1954), and 
Moffitt and Stagner (1956) found in- 
creased perceptual closure during 
paratroop training and experimental 
threat, respectively. 

In studies using group comparisons 
Angyal (1948) found more impair- 
ment in the recognition of patterns 
of letters under brief exposure condi- 
tions in high anxiety patients than 
other patients. Krugman (1947) and 
Goldstone (1955) found the threshold 
for flicker fusion to occur at a lower 
frequency for anxious than non- 
anxious subjects. 

Eriksen and Wechsler (1955) in- 
geniously attempted to separate the 
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effects of anxiety (shock induced) on 
response processes as opposed to 
sensory discrimination, and con- 
cluded that anxiety results in re- 
stricted and stereotyped response 
preferences but does not impair 
sensory discrimination. 

In the studies reviewed so far in 
this section the effect of stress has 
been in general to impair perform- 
ance. There are many studies, how- 
ever, in which improved performance 
is associated with stress. Thus, 
Steisel and Cohen (1951) and Truax 
and Martin (1957) found improved 
performance on simple arithmetic 
problems as a result of failure stress; 
and Spence (1957) found better re- 
call of words failed on an anagrams 
task than words successfully com- 
pleted. 

Likewise studies in which groups 
have been divided on the basis of 
self-report measures of general anxi- 
ety level also indicate that failure 
stress can lead to improved perform- 
ance for some subjects. Thus Lucas 
(1952), Waterhouse and Child (1954), 
Williams (1955), and Sarason (1956) 
found that low anxiety subjects tend 
to improve under stress and high 
anxiety subjects tend to show im- 
pairment under stress. 

Thus, to the extent that failure 
stress arouses anxiety, this construct 
appears to be associated with both 
improvement and impairment of 
performance. These seemingly con- 
tradictory findings are in part recon- 
ciled in a study by Stennett (1957), 
who instead of employing just one 
stress and one nonstress condition 
attempted to set up four degrees of 
intensity of motivation. He found 
that tracking performance improved 
at first as the rewards for correct 
performance increased but then 
showed impairment under the most 
extreme condition involving a large 
bonus for high level performance and 
threat of electric shock if this level 
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was not reached. He also obtained 
palmar conductance and muscle po- 
tential measures on his subjects and 
found these measures to increase 
monotonically as a function of in- 
creased ‘‘motivation.”” Several au- 
thors, consistent with this study and 
the others previously described, have 
proposed that adequacy of perform- 
ance is an inverted U shaped func- 
tion of some arousal, activation, or 
emotional state—for example, Wood- 
worth and Schlosberg (1954), Malmo 
(1957). 

Thus there appear to be two rather 
loose empirical generalizations that 
can be reached on the basis of the 
studies reviewed in this section: 
(a) that tasks involving relatively 
stronger and more numerous compet- 
ing responses are more subject to the 
impairing effects of stress, and (b) 
increasing stress results in improved 
performance up to a point and im- 
pairment thereafter. There is no 
particular evidence in this area to 
warrant the separation of anxiety as 
a construct from other more general 
constructs such as ‘‘arousal,”’ ‘‘acti- 
vation,” or “‘drive.”’ 

Somewhat differing theoretical for- 
mulations have been proposed to 
account for the empirical generaliza- 
tions described. Easterbrook (1959) 
makes a plausible case for the idea 
that many of the disorganizing effects 
of emotion can be accounted for on 
the basis of cue utilization: namely, 
that increased “drive” or “‘emotion”’ 
leads to a constriction of the percep- 
tual field or decrease in the number of 
cues that can be attended to. The 
Iowa theorists, on the other hand 
(Spence, 1958), employ the concept 
of drive and its hypothesized multi- 
plicative relationship to habit 
strength to account for many of the 
effects of stress on performance; and 
Child (1954), Child and Waterhouse 
(1953), and Sarason, Mandler, and 
Craighill (1952) emphasize the ir- 
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relevant competing responses specifi- 
cally associated with stress on the 
basis of the past learning. 

If anxiety proves to be a distin- 
guishable arousal state, research on 
its effects on performance would be 
greatly facilitated if it could be 
assessed independently, perhaps by 
physiological measures, from the 
performance being studied. The 
utility of this approach is shown in 
Stennett’s study, where it was not 
necessary to assume that experi- 
mental conditions were effective, or 
to rely upon some paper and pencil 
measure in determining the presence 
or magnitude of the motivational or 
emotional arousal state, but where 
instead the palmar conductance and 
muscle potential measures provided 
more direct evidence of the degree of 
arousal. 

In summary, no studies were dis- 
covered in which several objectively 
measured behavioral characteristics 
were obtained simultaneously (or al- 
most so) with a variety of physiologi- 
cal measures under conditions likely 
to be very fear arousing; much less, 
studies that in addition contrasted 
different types of arousal states. On 
the basis of the one and two variable 
type studies, though, it seems likely 
that some fairly simple learning, im- 
mediate memory, or perceptual tasks 
could be developed that would be 
sensitive to changes in anxiety level. 
It is possible that a few such tasks 
along with physiological measures 
could in the future help define more 
clearly the anxiety response pattern. 
Although, in general, improved meth- 
ods of continuous anxiety measure- 
ment will probably contribute more 
to the study of the effects of anxiety 
on behavior than vice versa. 


Behavioral Measures: JIntercorrela- 


tions 


Studies oriented toward assessing 
the intercorrelations among a num- 
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ber of behavioral manifestations of 
anxiety are beset by a special prob- 
lem. Physiological measures can 
usually be obtained simultaneously 
but many behavioral effects of anxiety 
can be assessed only by presenting 
the subject with a series of tasks to 
perform. Unknown order effects may 
well distort the obtained correlations. 

There have been several studies of 
this type in which a number of be- 
havioral measures, selected on the 
basis of previously reported relation- 
ships to anxiety, were intercorre- 
lated. Martin (1958, 1959) in two 
successive studies using college sub- 
jects, found the intercorrelations to 
be quite low, but a factor analysis 
still suggested the presence of a 
dimension that might be _ labeled 
anxiety. In the second study some of 
the measures that had the higher 
loadings on the anxiety factor were 
the Taylor MAS, .41; time to learn a 
complex (five choice) verbal maze, 
40; errors in learning of paired as- 
sociate nonsense syllables with high 
intralist similarity but low similarity 
between pairs, .39; tremors on a 
manual dexterity task, .39; an anxi- 
ety check list, .27. A simple verbal 
maze (two choice) and a paired as- 
sociate list involving low intralist 
similarity and high similarity be- 
tween pairs had zero order loadings 
on the factor. The loadings with re- 
spect to the two kinds of paired as- 
sociate lists and the two kinds of 
verbal mazes are consistent with the 
notion that tasks involving stronger 
competing responses are more sensi- 
tive to the effects of anxiety. A some- 
what more prominent factor that 
also emerged in both studies was 
interpreted as a motivational factor, 
that is, a dimension reflecting how 
hard these college subjects tried on a 
number of the tasks. Such individual 
differences in motivation were postu- 
lated to be relatively independent of 
the subjects’ anxiety level. A third 
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factor of some generality was identi- 
fied as intelligence, and yet another 
factor was entirely defined by self- 
report measures of anxiety such as 
the Taylor MAS. Thus performance 
on a given task such as learning 
paired associate nonsense syllables 
with high intralist similarity under 
mild stress was found not only to be 
affected by individual differences in 
anxiety but also by individual differ- 
ences in motivation, intelligence, and 
a factor specific to the type of task. 
Under these circumstances it is easy 
to see how anxiety variance could 
frequently be masked by other fac- 
tors. 

Rosenthal (1955), Cattell and 
Gruen (1955), and Scheier and Cat- 
tell (1958) reported several factor 
analytic studies in which a variety of 
self-report, behavioral, and, in some 
cases, physiological measures were 
obtained. They found a factor, which 
they label anxiety, emerging in all 
their studies that is separable from a 


number of other personality factors 
after relatively blind rotations to 
oblique simple structure. The above 
studies employed substantial Ns in 
five different samples of subjects in- 
volving USAF pilot trainees, chil- 


dren, and college students. Upon 
inspection of the factor loadings on 
the anxiety factor in these various 
studies as summarized by Cattell 
and Scheier (1958b) it becomes ap- 
parent, however, that the only meas- 
ures with high loadings and the only 
measures whose loadings are con- 
sistent from study to study are those 
based on self-report type measures. 
Few if any behavioral-physfological 
measures have loadings over .30 and 
none of those that do are substan- 
tiated in any of the other samples. 
For example, in Rosenthal’s study 
(1955) the three highest loadings on 
the anxiety factor were Taylor MAS, 
.85; questionnaire measure of anxious 
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insecurity, .84; and a questionnaire 
measure of nervous tension, .70. The 
other four measures with loadings 
above .30 were also self-report type 
measures. Rosenthal obtained sev- 
eral physiological measures under 
various conditions (GSR, heart rate, 
salivary volume, systolic blood pres- 
sure) and none of these were related 
to this anxiety factor to any degree. 
Under these circumstances it does 
not seem reasonable to accept this 
factor as necessarily assessing the 
hypothetical anxiety reaction as for- 
mulated in this paper. 

Cattell and Scheier (1958b) distin- 
guish between the “‘trait’’ of anxiety, 
inferred from factor analysis of a 
cross section of measures obtained 
only once on each subject, and the 
“state’’ of anxiety inferred from a 
factor analysis of change scores from 
one testing time to another. Cor- 
relating change scores in this way is 
referred to as incremental R_ tech- 
nique, and Cattell and Scheier 
(1958a) report in detail the results of 
such a study. An interesting innova- 
tion in this study, too involved to go 
into in this paper, was the introduc- 
tion of different “treatment’’ condi- 
tions into a correlational study, so 
that it was possible to see, for ex- 
ample, how imminence of academic 
examinations correlated with the 
other variables. One of the resulting 
14 factors was identified as the 
“state’’ anxiety factor and appears 
to represent an arousal state more 
closely related to the present theo- 
retical view of anxiety than the previ- 
ously found trait factor. The self- 
report measures did not dominate 
the loadings so much, although the 
two highest loadings were self-report 
measures involving an anxiety-ten- 
sion check list, .41, and a question- 
naire scale of tension, .40. In addi- 
tion though, systolic blood pressure 
had a loading of .30 and palmar con- 
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ductance of .26. Perhaps inconsistent 
with this was the positive loading of 
volume of saliva, .27. The imminence 
of an academic examination was 
negatively loaded, —.25, suggesting 
that just before an examination the 
usually anxious person becomes less 
anxious. The authors propose that 
‘“‘a person beset by vague fears and 
anxieties loses these anxieties for a 
while when a real danger threatens.” 

Holtzman and Bitterman (1956) 
intercorrelated 41 measures obtained 
on 135 cadets in an Air ROTC unit. 
These measures included ratings, 
personality tests, stress tests, per- 
ceptual tests, GSR conditioning, and 
amount of uric acid and glycine in 
the urine. The intercorrelations 
among the different kinds of meas- 
ures were quite low and a factor 
analysis yielded seven factors which 
were almost entirely determined by 
clusters of measures taken from the 
same test situation. 

There are some important limita- 
tions to the factor analytic approach 
to the study of anxiety. For example, 
there is no convincing logic to the 
supposition that simple structure, 
oblique or orthogonal, yields the most 
psychologically meaningful dimen- 
sions; although intuitively it would 
seem that some kind of oblique solu- 
tion would be more meaningful for 
separating out a cluster of physiologi- 
cal-behavioral measures to be ideni- 
fied as anxiety as opposed to clusters 
of measures representing other arous- 
al states, since in all likelihood these 
various arousal states will be corre- 
lated. With respect to rotations in 
factor analytic studies perhaps it 
would be better if such rotations were 
not done blindly but with full knowl- 
edge of the nature of the measures, 
and the final rotation considered 
frankly for what it is, a post hoc 
hypothesis about the nature of the 
dimensions revealed. Confirmation 
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of the interpretation of a given factor 
and further elucidation of the con- 
struct validity (Cronbach & Meehl, 
1955) of the assessment procedures 
can then be ascertained by introduc- 
ing the factor as a variable in experi- 
mental research. 

Certainly the selection of measures 
to be intercorrelated affects the defi- 
nition of the resulting factors. For 
example, it may be that in the Cattell 
studies just described, with the ex- 
ception of the incremental R_ tech- 
nique study, that the high intercorre- 
lations among the self-report meas- 
ures, which almost entirely define 
the anxiety factor, are due in part to 
correlated nonanxiety variance. It is 
also possible that many of the meas- 
ures used in the factor analytic 


studies involve characteristic ways of 
controlling or reducing anxiety rather 
than more direct manifestations of 
the anxiety itself. The Holtzman and 
Bitterman study serves to point up 


the fact that in an area where corre- 
lations between measures obtained 
from different response systems are 
going to be low at best, including 
clusters of highly intercorrelated 
measures from the same response 
system or test situation will inevi- 
tably result in factors representing 
these clusters, at least when the com- 
mon criteria for simple structure are 
employed. It is possible that a factor 
analysis done under such conditions 
might serve to actually hide some 
real generalities of response, although 
there is no indication that such was 
the case in the Holtzman and Bitter- 
man study. 

One cannot conclude on the basis 
of the researches reviewed in this 
paper, despite many suggestive leads, 
that any clear-cut pattern of physio- 
logical-behavioral responses associ- 
ated with anxiety arousal, distin- 
guishable from other arousal patterns 
has been demonstrated. The status 
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of anxiety assessment procedures, 
both in terms of experimental and 
correlational findings might be clari- 
fied by combining some of the best 
features of the researches described. 
First one might attempt to measure 
simultaneously, or nearly so, an ex- 
tensive battery of physiological meas- 
ures and a few selected behavioral 
measures at a time when the subject 
is relaxed. This would necessitate a 
preliminary adaptation-to-the-appa- 
ratus session. Then the subjects 
could be tested again under defi- 
nitely anxiety arousing circumstances, 
the more realistic the better. A 


study of the change score patterns 
and intercorrelations, after correcting 
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where necessary for correlation with 
relaxed session levels, should provide 
evidence for an anxiety pattern if it 
exists. It would then be further 
necessary to demonstrate that the 
pattern of responses was distinguish- 
able from patterns associated with 
other arousal states such as general 
activation, anger, or sex; otherwise 
there is no utility in having a con- 
struct of anxiety separate from these 
others. 

When more is known about the 
physiological-behavioral response 
pattern associated with anxiety, then 
self-report scales can be constructed 
which will predict this response pat- 
tern in various situations. 
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